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The  demand  for  computers  with  ever  greater  throughput  coupled  with 
the  decreased  costs  accompanying  advances  in  semiconductor  technology 
has  created  a  great  deal  of  interest  in  parallel  processing  systems. 
Single  instruction  stream  -  multiple  data  stream  (SIMD)  machines  and 
multiple  instruction  stream  -  multiple  data  stream  (MIMD)  machines  are 
two  types  of  parallel  processing  system  architectures.  PASM  is  a  parti- 
tionable  SIMD/MIMD  parallel  processor,  intended  to  operate  in  either 
mode  of  parallelism,  being  developed  at  Purdue  University.  The  inter¬ 
connection  network  chosen  for  this  system  will  greatly  influence  its 
performance.  The  Generalized  Cube  and  the  Augmented  Data  Manipulator 
(ADM)  are  two  networks  being  considered  for  use  in  PASM.  This  work  is 
primarily  concerned  with  the  capabilities  of  the  ADM  network  in  SIMD 
mode . 

The  number  of  data  permutations  passable  by  the  ADM  network  is  ex¬ 
plored.^.  First  the  number  of  permutations  performable  by  any  stage  is 
-counted.  Using  partitioning  properties  of  the  network  and  combinatorial 
mathematics,  this  result  is  extended  to  permutations  performable  by  the 
entire  network.  For  N  =  8  an  exact  count  of  the  number  of  performable 
permutations  is  given.  For  N  >  8,  upper  and  lower  bounds  are  given. 
Comparison  with  the  Generalized  Cube  network  is  made. 

Routing  tag  schemes  are  described  for  both  the  Generalized  Cube  and 
ADM  networks.  The  number  of  data  permutations  passable  by  the  ADM  net¬ 
work  using  positive  dominant  or  negative  dominant  permutation  routing 
tags  is  counted.  The  number  of  permutations  passable  using  natural  per- 
V  mutation  routing  tags  is  bounded. 

Algorithms  for  determining  permutation  passability  in  the  ADM  net¬ 
work  using  three  related  types  of  routing  tags  for  distributed  network 
control  are  presented.  Correctness  proofs  are  given  and  algorithm  com¬ 
plexity  determined. 

To  further  investigate  ADM  network  capabilities  in  SIMD  mode,  group 
theory  is  used  to  derive  additional  properties.  It  is  shown  that  the 
ADM  network  cannot  pass  all  even  permutations  when  N  8. 
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ABSTRACT 


The  demand  for  computers  with  ever  greater  throughput  coupled  with 
the  decreased  costs  accompanying  advances  in  semiconductor  technology 
has  created  a  great  deal  of  interest  in  parallel  processing  systems. 
Single  instruction  stream  -  multiple  data  stream  (SIMD)  machines  and 
multiple  instruction  stream  -  multiple  data  stream  (MIMD)  machines  are 
two  types  of  parallel  processing  system  archi tectures .  PASM  is  a  parti- 
tionable  SIMD/MIMD  parallel  processor,  intended  to  operate  in  either 
mode  of  parallelism,  being  developed  at  Purdue  University.  The  inter¬ 
connection  network  chosen  for  this  system  will  greatly  influence  its 
performance.  The  Generalized  Cube  and  the  Augmented  Data  Manipulator 
(ADM)  are  two  networks  being  considered  for  use  in  PASM.  This  work  is 
primarily  concerned  with  the  capabilities  of  the  ADM  network  in  SIMD 
mode . 

The  number  of  data  permutations  passable  by  the  ADM  network  is  ex¬ 
plored.  First  the  number  of  permutations  performable  by  any  stage  is 
counted.  Using  partitioning  properties  of  the  network  and  combinatorial 
mathematics,  this  result  is  extended  to  permutations  performable  by  the 
entire  network.  For  N  =  8  an  exact  count  of  the  number  of  performable 
permutations  is  given.  For  N  >  8,  upper  and  lower  bounds  are  given. 
Comparison  with  the  Generalized  Cube  network  is  made. 

Routing  tag  schemes  are  described  for  both  the  Generalized  Cube  and 
ADM  networks.  The  number  of  data  permutations  passable  by  the  ADM  net- 
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work  using  positive  dominant  or  negative  dominant  permutation  routing 
tags  is  counted.  The  number  of  permutations  passable  using  natural  pei — 
mutation  routing  tags  is  bounded. 

Algorithms  for  determining  permutation  passability  in  the  ADM  net¬ 
work  using  three  related  types  of  routing  tags  for  distributed  network 
control  are  presented.  Correctness  proofs  are  given  and  algorithm  com¬ 
plexity  determined. 

To  further  investigate  ADM  network  capabilities  in  SIMD  mode,  group 
theory  is  used  to  derive  additional  properties.  It  is  shown  that  the 
ADM  network  cannot  pass  all  even  permutations  when  N  ^  8. 
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CHAPTER  1 
INTRODUCTION 

Throughput  has  been,  and  remains,  a  major  Limiting  factor  of  the 
scope  of  data  processing  tasks  performed  by  computer  systems.  Many 
tasks  of  current  interest  such  as  machine  vision,  image  processing,  se¬ 
ismic  exploration,  air  traffic  control,  and  aerodynamic  simulation  could 
greatly  benefit  from  performance  that  is  in  excess  of  current  computer 
systems. 

Historically,  computer  system  designers  have  attempted  to  meet  the 
demand  for  increased  throughput  by  building  new  generations  of  machines 
which,  most  often,  differed  from  their  predecessors  only  in  circuit 
switching  speed.  System  architecture  remained  reasonably  similar  to  the 
basic  von  Neumann  machine.  To  continue  the  significant  gains  made  over 
the  years  using  this  approach  will  require  further  major  reductions  in 
circuit  switching  times.  Indeed,  circuits  using  the  Josephson  effect 
promise  to  make  picosecond  switching  times  practical  in  the  not  too  dis¬ 
tant  future.  But,  there  is  an  ultimate  limit  to  the  switching  speed  of 
a  given  circuit,  determined  by  the  propagation  speed  of  electromagnetic 
waves:  the  speed  of  light.  So  alternate  methods  of  improving 
throughput  are  of  interest. 

Throughput  is  not  directly  dependent  on  the  circuit  switching 
speed.  Ultimately,  throughput  is  maximized  on  a  given  system  when  any 
task  takes  only  one  instruction  cycle  to  execute.  To  achieve  a 


2 


reduction  in  the  number  of  system  instruction  cycles,  new  machine  organ¬ 
izations  and  algorithm  structures  may  be  used. 

In  tandem  with  the  improvements  in  circuit  switching  speeds,  signi¬ 
ficant  circuit  cost  reductions  have  been  realized.  The  reduced  cost  of 
hardware  has  made  large  scale  parallel  processing  systems  feasible. 
Such  architectures  are  suited  for  problems  that  can  be  decomposed  into 
independent  subtasks.  Simultaneous  execution  of  these  subtasks  allows  a 
reduction  in  the  number  of  system  instruction  cycles  needed  to  perform 
the  task.  All  of  the  problems  listed  previously  as  being  computational¬ 
ly  intensive  could  benefit  from  parallel  processing  systems. 

One  type  of  parallel  architecture  is  the  single  instruction  stream 
-  multiple  data  stream  (SIMD)  system.  Such  machines  typically  consist 
of  N  processors,  N  memories,  an  interconnection  network,  and  a  control 
unit.  The  control  unit  broadcasts  instructions  to  all  processing  ele¬ 
ments,  and  all  active  processors  execute  the  same  instruction  simultane¬ 
ously.  This  is  the  single  instruction  stream.  Each  processor  executes 
these  instructions  on  data  stored  in  a  memory  with  which  it  is  associat¬ 
ed.  This  provides  the  multiple  data  stream.  The  interconnection  net¬ 
work  serves  to  provide  interprocessor  communication. 

A  second  type  of  parallel  processor  system  is  the  multiple  instruc¬ 
tion  stream  -  multiple  data  stream  (MIMD)  machine.  Again  there  are  typ¬ 
ically  N  processors,  N  memories,  and  an  interconnection  network.  Howev¬ 
er,  processors  execute  instructions  from  their  own  memories,  thus  pro¬ 
viding  multiple  instruction  streams. 

The  interconnection  network  chosen  for  a  parallel  processing  system 
will  greatly  influence  the  performance  of  the  machine.  Many  questions 
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about  the  capabilities  of  interconnection  networks  remain  unanswered. 
This  is  especially  true  for  the  SIMD  system  environment  where  the  inter¬ 
connection  network  will  often  be  called  upon  to  transfer  information 
among  all  N  processors  simultaneous ly,  that  is,  to  perform  data  permuta¬ 
tions.  Poor  performance  in  passing  needed  permutations  could  render  a 
particular  interconnection  network  unsuitable  for  use  in  an  SIMD  system 
by  causing  a  serious  degradation  in  processor  utilization. 

PASM,  a  parti tionable  SIMD/MIMD  parallel  processor,  is  a  reconfi- 
gurable  multimicroprocessor  system  under  development  at  Purdue  Universi¬ 
ty.  It  is  designed  to  operate  in  either  mode  of  parallelism.  Two  in¬ 
terconnection  networks  are  being  considered  for  use  in  PASM:  the  Gen¬ 
eralized  Cube  and  the  Augmented  Data  Manipulator  (ADM).  The  Generalized 
Cube  has  been  studied.  Various  properties  of  the  ADM  have  been  examined 
but  further  investigation  is  needed.  This  work  is  concerned  with  the 
capabilities  of  the  ADM  network  in  SIMD  mode.  Increased  knowledge  of 
ADM  network  performance  will  allow  a  more  informed  choice  of  an  inter¬ 
connection  network  for  PASM. 

The  general  model  of  SIMD  parallel  processing  systems  to  be  used 
throughout  this  work  is  described  in  Chapter  2.  Chapter  3  contains  a 
brief  overview  of  PASM.  In  Chapter  4,  the  two  networks  are  formally  de¬ 
fined.  The  setting  of  the  networks  to  perform  permutations  is  dis¬ 
cussed.  Chapter  5  presents  the  argument  for  counting  the  number  of  dis¬ 
tinct  permutations  passable  by  the  Generalized  Cube  network,  for  later 
comparison  with  the  ADM  network. 

Chapter  6  investigates  one  parameter  of  ADM  network  performance  - 
the  number  of  passable  permutations.  This  development  involves  both 
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properties  of  network  topology  and  combinatorial  mathemati cal  tech¬ 
niques.  First,  the  number  of  permutations  performable  by  a  network 
stage  is  counted.  Then,  using  network  partitioning,  the  arguments  are 
extended  to  provide  upper  and  lower  bounds  on  the  number  of  data  permu¬ 
tations  passable  by  the  ADM  network.  An  exact  count  is  given  for  the 
case  N  =  8.  Finally,  the  asymptotic  behavior  of  the  bounds  is  analyzed 
and  a  comparison  with  the  number  of  Generalized  Cube  performable  permu¬ 
tations  is  made. 

The  use  of  routing  tags  for  distributed  network  control  is  dis¬ 
cussed  in  Chapter  7.  Routing  tag  schemes  for  the  Generalized  Cube  are 
The  permuting  capability  of  the  network  is  not  at  all  limited  by  the 
routing  tag  control. 

Chapter  8  reviews  two  families  of  routing  tags  for  the  ADM  network. 
A  count  of  the  number  of  permutations  passable  using  positive  dominant 
or  negative  dominant  permutation  routing  tags  is  given.  The  number  of 
permutations  passable  using  natural  permutation  routing  tags  is  bounded. 

Algorithms  are  developed  in  Chapter  9  to  determine  the  passability 
of  an  arbitrary  permutation  by  the  ADM  network  under  distributed  control 
by  any  of  three  related  routing  tags.  The  correctness  of  the  algorithm 
is  proven. 

Further  ADM  network  properties,  derived  using  group  theory,  are 
presented  in  Chapter  10.  The  permutations  passable  by  the  ADM  are  addi¬ 
tionally  characterized  as  a  result. 
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CHAPTER  2 

MODEL  OF  SIMD  MACHINES 

The  acronym  SIMD  stands  for  single  instruction  stream  -  multiple 
data  stream  CFL3.  Typically,  an  SIMD  machine  is  a  computer  system  con¬ 
sisting  of  a  control  unit,  N  processors,  N  memory  modules,  and  an 
interconnection  network.  The  control  unit  broadcasts  instructions  to 
all  of  the  processors,  and  all  active  processors  execute  the  same  in¬ 
struction  at  the  same  time.  Thus,  there  is  a  single  instruction  stream. 
Each  active  processor  executes  the  instruction  on  data  in  its  own  asso¬ 
ciated  memory  module.  Thus,  there  is  a  multiple  data  stream.  The  in¬ 
terconnection  network,  sometimes  referred  to  as  an  alignment  or 
permutation  network,  provides  a  communication  facility  for  the  proces¬ 
sors  and  memory  modules. 

One  way  to  model  the  physical  structure  of  an  SIMD  machine  is  shown 
in  Figure  2.1.  As  indicated,  there  are  ^processing  elements  (PEs), 
where  each  PE  consists  of  a  processor  with  its  own  memory.  The  PEs  re¬ 
ceive  their  instructions  from  the  control  unit.  Communi cat  ion  among  the 
PEs  is  accomplished  through  the  use  of  the  interconnection  network. 
This  structure  is  called  the  PE-to-PE  approach.  The  Illiac  IV  C30U]  is 
an  example  of  this  conf iguration. 

Because  each  processor  has  direct  access  to  its  local  memory  module 
and  relatively  poorer  access  to  any  other  memory  module,  tasks  requiring 
transfers  of  large  blocks  of  data  between  PEs  should  be  avoided. 


.a*  _ 
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Figure  2.1  A  PE-to-PE  model  of  an  SIMD  machine 
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Rather,  algorithms  involving  a  data  base  which  can  be  partitioned  into 
largely  noninteracting  segments  are  most  suited  to  this  structure.  Com¬ 
munication  between  PEs  can  be  supported  by  a  unidirectional  interconnec¬ 
tion  network  since  each  PE  has  access  to  a  network  input  and  output. 

A  second  way  to  model  an  SIMD  machine  is  shown  in  Figure  2.2.  This 
is  the  processor-to-memory  approach.  In  general,  there  may  be  P  proces¬ 
sors  connected  to  M  memories  through  the  interconnection  network  in  this 
approach.  The  figure  shows  the  case  where  there  are  N  processors  and  N 
memories.  The  BSP  [JE]  is  an  SIMD  machine  with  this  structure. 

In  this  case  transfer  of  large  blocks  of  data  from  processor  to 
processor  is  easily  accomplished  by  using  the  interconnect  ion  network  to 
change  the  memory  module  linked  to  a  given  processor.  One  disadvantage 
of  this  architecture  is  that  each  instruction  or  data  fetch  must  pass 
through  the  network.  Another  is  that  two  processors  can  only  communi¬ 
cate  through  a  shared  memory  module. 

For  the  processor-to-memory  structure,  processors  must  be  able  to 
perform  memory  read  and  write  operations  through  the  interconnection 
network.  If  the  processors  and  memories  have  fixed  access  to  the  inter¬ 
connection  network,  then  it  must  support  bidirectional  communi cation.  A 
unidirectional  interconnection  network  can  be  used  if  provision  is  made 
to  allow  either  processors  or  memories  to  be  attached  to  both  network 
inputs  and  outputs. 

Further  information  about  SIMD  machine  structures  is  contained  in 
CST3.  Variations  on  the  PE-to-PE  and  processor-to-memory  architectures 
are  discussed  in  C8A3  and  CLA].  A  mathematical  model  of  SIMD  machines 


is  presented  in  CS 1 5 3 . 
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Figure  2.2  A  processor-to-memory  model  of  an  SIMD  machine. 


The  model  of  SIMD  machines  to  be  used  in  this  and  subsequent 
chapters,  is  the  PE-to-PE  model  CSI5D.  Each  PE  is  assigned  a  unique  ad¬ 


dress  from  0  to  N-1,  represented  in  binary  as  Pn-1 Pn_£ • • -P^ Pq-  The 
results  obtained  for  the  ADM  network  will  be  valid  for  either  model, 
however . 


t. 
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CHAPTER  3 
OVERVIEW  OF  PASM 

There  are  several  types  of  parallel  processing  systems.  An  SIMP 
(single  instruction  stream  -  multiple  data  stream)  machine  typically 
consists  of  a  set  of  N  processors,  N  memories,  an  interconnection  net¬ 
work,  and  a  control  unit  (e.g.  Illiac  IV).  The  control  unit  Droadcasts 
instructions  to  the  processors  and  all  active  ("turned  on")  processors 
execute  the  same  instruction  at  the  same  time.  Each  processor  executes 
instructions  using  data  taken  from  a  memory  with  which  only  it  is  asso¬ 
ciated.  The  interconnection  network  allows  interprocessor  communica¬ 
tion.  An  WIMP  (multiple  instruction  stream  -  multiple  data  stream) 
machine  usually  consists  of  N  processors  and  N  memories,  where  each  pro¬ 
cessor  can  follow  an  independent  instruction  stream  le.g.  C.mmp).  As 
with  SIMP  architectures,  there  is  a  multiple  data  stream  and  an  inter¬ 
connection  network.  A  parti tionable  SIMP/MIMP  system  is  a  parallel  pro¬ 
cessing  system  which  can  be  structured  as  two  or  more  independent  SIMP 
and/or  MIMP  machines.  In  this  chapter,  the  basic  organization  of  PASM, 
a  partitionable  SIMP /MI MP  system  being  desgned  at  Purdue  University  for 
image  processing  and  pattern  recognition,  is  briefly  overviewed. 

SIMP  machines  can  be  used  for  "local"  processing  of  segments  of  im¬ 
ages  in  parallel.  For  example,  the  image  can  be  segmented,  and  each 
processor  assigned  a  segment.  Then,  following  the  same  set  of  instruc¬ 
tions,  such  tasks  as  line  thinning,  threshold  dependent  operations,  and 
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gap  filling  can  be  done  in  parallel  for  all  segments  of  the  image  simul¬ 
taneously.  Also  in  SIMD  mode,  matrix  arithmetic  used  for  such  tasks  as 
statistical  pattern  recognition  can  be  done  efficiently.  MIMD  machines 
can  be  used  to  perform  different  "global"  pattern  recognition  tasks  in 
parallel,  using  multiple  copies  of  the  image  or  one  or  more  shared 
copies.  For  example,  in  cases  where  the  goal  is  to  locate  two  or  more 
distinct  objects  in  an  image,  each  object  can  be  assigned  a  processor  or 
set  of  processors  to  search  for  it.  An  SIMD/MIMD  application  might  in¬ 
volve  using  the  same  set  of  microprocessors  for  preprocessing  an  image 
in  SIMD  mode  and  then  doing  a  pattern  recognition  task  in  MIMD  mode. 

PASM  is  a  special  purpose,  dynamically  reconf igurable,  large-scale 
multimicroprocessor  system.  Due  to  the  low  cost  of  microprocessors, 
computer  system  designers  have  been  considering  various  multimicrocom¬ 
puter  architectures.  PASM  was  the  first  multimicroprocessor  system  in 
the  literature  to  combine  the  following  features:  (1)  it  can  be  parti¬ 
tioned  to  operate  as  many  independent  SIMD  and/or  MIMD  machines  of  vary¬ 
ing  sizes;  and  (2)  a  variety  of  problems  in  image  processing  and  pattern 
recognition  will  be  used  to  guide  the  design  choices. 

Figure  3.1  is  a  block  diagram  of  the  basic  components  of  PASM.  The 
System  Control  Unit  (SCU)  is  a  conventional  machine,  such  as  a  PDP-11, 
and  is  responsible  for  the  overalt  coordination  of  the  activities  of  the 
other  components  of  PASM.  By  carefully  choosing  which  tasks  should  be 
assigned  to  the  SCU  and  which  should  be  assigned  to  other  system  com¬ 
ponents  (such  as  the  Memory  Management  System),  the  SCU  can  work  effec¬ 
tively  and  not  become  a  bottleneck. 
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The  Parallel  Computation  Unit  (PCU)  contains  N_  =  2n  processors,  N 
memory  modules,  and  an  interconnection  network-  The  PCU  processors  are 
microprogrammable  microprocessors  that  perform  the  actual  SIMD  and  MIMD 
computations.  The  PCU  memory  modules  are  used  by  the  PCU  processors  for 
data  storage  in  SIMD  mode  and  both  data  and  instruction  storage  in  MIMD 
mode.  A  memory  module  is  connected  to  each  processor  to  form  a  proces¬ 
sor  -  memory  pair  called  a  processing  element  (PE)  as  shown  in  Figure 
3.2.  A  pair  of  memory  units  is  used  for  each  memory  module.  This 
double-buffering  scheme  allows  data  to  be  moved  between  one  memory  unit 
and  secondary  storage  (the  Memory  Storage  System)  while  the  processor 
operates  on  data  in  the  other  memory  unit. 

The  interconnection  network  provides  a  means  of  communication  among 
the  PCU  PEs.  Two  different  interconnection  networks  are  being  con¬ 
sidered  for  PASM:  the  Generalized  Cube  and  the  ADM.  Both  consist  of 
ji  =  log^N  stages  of  switches  and  are  controlled  by  routing  tags.  Both 
can  be  partitioned  into  independent  subnetworks  if  all  of  the  PEs  in  a 
partition  of  size  P  =  2P  have  the  same  value  in  the  low  order  n-p  bit 
positions  of  their  addresses.  Studies  are  currently  being  conducted  to 
choose  which  of  these  networks  to  implement  in  PASM.  This  work  is  a 
part  of  that  effort. 

The  Micro  Controllers  (MCs)  are  a  set  of  Q  =  2q  microprogrammable 
microprocessors,  numbered  (addressed)  from  0  to  Q-1,  which  act  as  the 
control  units  for  the  PCU  processors  in  SIMD  mode  and  orchestrate  the 
activities  of  the  PCU  processors  in  MIMD  mode.  Each  MC  is  attached  to  a 
memory  module  (a  pair  of  memory  units  so  that  memory  loading  and  compu¬ 
tations  can  be  overlapped).  Control  Storage  contains  the  programs  for 


memory  management  system 


Figure  3.2  PASM  Parallel  Computation  Unit 


Each  MC  controls  N/Q  PCU  processors.  The  physical  addresses  of  the 
N/Q  PEs  connected  to  an  MC,  shown  in  Figure  3.3,  have  as  their  low-order 


q  bits  the  physical  address  of  the  MC,  so  that  the  network  can  be  parti¬ 
tioned.  Possible  values  for  N  and  Q  are  1024  and  16,  respectively.  A 
virtual  SIMD  machine  (partition)  of  size  RN/Q,  R  =  2r  and  1  £  r  <  q,  is 
obtained  by  loading  R  MC  memory  modules  with  the  same  instructions 
simultaneously.  In  SIMD  mode,  the  R  MCs  are  synchronized  and  each  MC 
fetches  instructions  from  its  memory  module,  executing  the  control  flow 
instructions  (e.g.  branches)  and  broadcasting  the  data  processing  in¬ 
structions  to  its  PCU  PEs.  Similarly,  a  virtual  MIMD  machine  of  size 
RN/Q  is  obtained  by  combining  the  efforts  of  the  PCU  processors  of  R 
MCs.  In  both  cases,  the  physical  addresses  of  these  MCs  must  have  the 
same  low-order  q-r  bits  so  that  all  of  the  PCU  PEs  in  the  partition  have 
the  same  low-order  q-r  physical  address  bits. 

In  each  partition,  the  PCU  PEs  are  assigned  logical  addresses. 
Given  a  virtual  machine  of  size  RN/Q,  the  PEs  have  logical  numbers,  0  to 
(RN/Q)-1,  (the  high-order  r+n-q  bits  of  the  physical  number).  Similar¬ 
ly,  the  MCs  are  assigned  logical  numbers  from  0  to  R-1  (for  R  >  1,  the 
high-order  r  bits  of  its  physical  number).  The  PASM  language  compilers 
and  operating  system  will  be  used  to  convert  from  logical  to  physical 
addresses,  so  a  system  user  will  deal  only  with  logical  addresses. 

The  Memory  Management  System  controls  the  loading  and  unloading  of 
the  PCU  memory  modules.  It  employs  a  set  of  cooperating  dedicated  mi¬ 
croprocessors.  The  Memory  Storage  System  provides  secondary  storage  for 
these  files.  Multiple  devices  are  used  to  allow  parallel  data 
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!  transfers. 

The  Memory  Storage  System  will  consist  of  N/Q  independent  Memory 
Storage  units,  numbered  from  0  to  (N/Q)-1.  Each  Memory  Storage  unit  is 
connected  to  Q  PCU  memory  units.  For  0  £  i  <  N/Q,  Memory  Storage  unit  i 
is  connected  to  those  memory  modules  whose  physical  addresses  are  of  the 
form  (Q*i)+k,  0  £  k  <  Q.  Thus,  Memory  Storage  unit  i  is  connected  to 
the  i£^  processor/memory  module  pair  of  each  MC  as  shown  in  Figure  3.4. 
Since  the  PE  memories  are  double-buffered,  while  one  job  is  being  pro¬ 
cessed,  results  from  the  previous  job  can  be  stored  and  the  next  may  be 
loaded. 

The  two  main  advantages  of  this  approach  for  a  partition  of  size 
N/Q  are  that  (1)  all  of  the  memory  modules  can  be  loaded  in  parallel  and 
(2)  the  data  is  directly  available  no  matter  which  partition  (MC  group) 
is  chosen.  This  is  done  by  storing  in  Memory  Storage  unit  i  the  data 
for  a  task  which  is  to  be  loaded  into  the  ith  logical  memory  module  of 
the  virtual  machine  of  size  N/Q,  0£  i  <  N/Q.  Thus,  no  matter  which  MC 
group  of  N/Q  processors  is  chosen,  the  data  from  the  ith  Memory  Storage 
unit  can  be  loaded  into  the  ith  logical  memory  module  of  the  virtual 
machine,  for  all  i,  0  £  i  <  N/Q,  simultaneously,  i.e.,  in  one  parallel 

block  transfer.  This  same  approach  can  be  taken  if  only  (N/Q)/?01  dis¬ 

tinct  Memory  Storage  units  are  available,  0  £  d  £  n-1,  using  2d  parallel 
block  loads  will  be  required  instead  of  just  one.  In  general,  a  task 
needing  RN/Q  processors,  1  £  R  £  Q,  logically  numbered  0  to  (RN/Q)-1, 

will  require  R  parallel  block  loads  if  the  data  for  the  memory  module 

whose  high-order  n-q  logical  address  bits  equal  i  is  loaded  into  Memory 
Storage  unit  i.  This  is  true  no  matter  which  group  of  R  MCs  (which 
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Fig  .re  3. A  Organization  of  the  PASM  Memory  Storage  System  for 
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agree  in  their  Low-order  q-r  address  bits)  is  chosen.  If  only  (N/Q)/2^ 
distinct  Memory  Storage  units  are  available,  0  £  d  £  n-q,  then  R*2d 
parallel  block  loads  will  be  required  instead  of  just  R. 

A  set  of  microprocessors  is  dedicated  to  performing  the  Memory 
Management  System  tasks  in  a  distributed  fashion,  i.e.,  one  processor 
handles  Memory  Storage  System  bus  control,  one  handles  the  scheduling 
tasks,  etc.  This  distributed  processing  approach  is  chosen  in  order  to 
provide  the  Memory  Management  System  with  a  Large  amount  of  processing 
power  at  low  cost  and  high  speed  (due  to  the  parallelism  possible). 

This  overview  of  PASM,  a  large  scale  partitionable  SIMD/MIMD  mul¬ 
timicroprocessor  system  for  image  processing  and  pattern  recognition, 
has  been  provided  as  background  material  for  the  following  chapters. 
For  additional  information  about  various  aspects  of  PASM  see:  organiza¬ 
tion  CSI3,SMS1 ,SSKMS],  instruction  set  CSMl],  masking  schemes  for  ena¬ 
bling  and  disabling  PEs  ESI 1 ,SI2,SMS1 ,SSKMS],  interconnection  networks 
CMS,SI1,SI4,SI5,SI6,SS1,SS2,SSMAD,  operating  system  CSSMMS3,  programming 
language  CMSS1],  and  memory  management  system  CSKW,SSKMSD,  and  examples 
of  use  CSI7,FSS,MSS2,SMS2,SSE:. 
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iAPTER  4 

NETWORK  DEFINITIONS 

In  the  SIMD  environment  it  is  useful  to  describe  the  interconnec¬ 
tion  network  as  a  set  of  interconnection  functions,  where  each  is  a  per¬ 
mutation  (bijection)  on  the  set  of  PE  addresses  CS I 1 □ .  When  intercon¬ 
nection  function  f  is  applied,  network  input  i  is  connected  to  network 
output  f(i)  for  all  i,  0  <  i  <  N,  simultaneously.  That  is,  saying  that 
the  interconnection  function  maps  the  source  address  S  to  the  destina¬ 
tion  address  D  is  equivalent  to  saying  the  interconnection  function 
causes  data  sent  on  the  input  line  with  address  S  to  be  routed  to  the 
output  line  with  address  D. 

The  physical  structure  of  an  interconnection  network  can  be 
described  by  several  parameters.  A  link  or  connection  carries  messages 
or  data  in  the  network  between  other  network  elements.  A  switching 
element  selects  the  link  or  links  over  which  messages  or  data  will  be 
sent  through  the  network.  A  set  of  links  connecting  a  network  input,  or 
source,  to  a  network  output,  or  destination,  is  called  a  route. 

The  Generalized  Cube  network  CSS13  is  shown  in  Figure  4.1  for 
N  =  8,  where  N  is  the  number  of  inputs  to  the  network.  It  is  an 
n  =  log^N  stage  network  where  each  stage  implements  one  of  the  cube  in¬ 
terconnection  functions  CSS1 ] .  The  n  cube  functions  are  defined  by 
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Figure  4.1  The  Generalized  Cube  network  for  N  =  8.  The  straight  and 
exchange  connections  of  the  interchange  box  are  shown. 
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Ctjbei  (pn-1  '**p0)  =  pn-1  **'pi  +  1pipi-1'"p0 

for  0  <_  i  <  n.  The  switching  elements  of  this  network  are  called  inter¬ 
change  boxes.  For  performing  permutations  there  are  two  legitimate 
states  of  an  interchange  box:  (1)  straight  -  input  i  to  output  i,  input 
j  to  output  j;  and  (2)  exchange  -  input  i  to  output  j,  input  j  to  output 
i.  In  each  stage  of  this  network  the  pair  of  inputs  to  an  interchange 
box  is  selected  so  that  cube,  maps  one  to  the  other,  and  vice  versa. 
When  an  interchange  box  in  stage  i  is  set  to  exchange,  the  data  items 
input  to  that  interchange  box  are  transferred  as  specified  by  the  cube, 
interconnection  function.  When  set  to  straight,  data  items  input  are 
transferred  according  to  the  identity  function,  i.e.  identity 
(pn_,j...pg)  =  p^_^...pQ.  Since  each  interchange  box  is  individually 
controlled,  each  stage  i  will  perform  the  cube,  interconnection  function 
on  some  subset  of  the  data  items  depending  on  the  settings  of  the  inter¬ 
change  boxes. 

There  is  a  class  of  cube-type  networks  of  which  the  Generalized 
Cube  is  representat i ve.  By  combining  the  results  of  CSI4,SS1,WF1,WF2] 
it  can  be  seen  that  all  of  the  following  networks  are  topological ly 
equivalent:  Generalized  Cube  CSS13,  the  STARAN  flip  network  [BAD,  the 
omega  network  CIA],  and  the  indirect  binary  n-cube  network  CPE].  (The 
SW-banyan  (S=F=2)  is  defined  as  a  graph  CGL]  and  has  the  same  topology 
as  a  multistage  cube  CWF23.)  For  this  reason  the  Generalized  Cube  can  be 
used  as  a  standard  for  comparing  cube-type  networks  with  other  intercon¬ 
nection  networks. 

The  augmented  data  manipulator  (ADM)  network  is  shown  in  Figure  4.2 
for  N  =  8.  It  is  an  N  input,  n  stage  network  based  on  the  PM2I  (plus- 


Figure  4.2  The  augmented  data  manipulator  (ADM)  network  for  N 
Straight  connections  are  shown  by  the  dotted  line;  PM2+ 
the  solid  lines;  and  PM2  •,  by  the  dashed  lines. 


minus  21)  interconnection  functions  CSI1D.  Each  of  the  n  stages  con¬ 
sists  of  N  switch  cells.  There  is  also  an  (n+1)-st  column  of  network 
output  cells.  The  PM2I  functions  are  defined  by 

PM2+^(j)  =  j  +  2^  modulo  N 

and 


PM2_.(j)  =  j  -  2^  modulo  N 

for  0  <  j  <  N,  0  <  i  <  n.  Note  that  PM2  =  PM2  .  s .  Each  cell 

of  the  ADM  can  receive  none,  one,  two,  or  three  of  the  signals  straight, 
PM2+^,  and  PM2_^  CSI1,SI6D.  Corresponding  to  Figure  A. 2,  the  signal 
"PM2  means  use  the  solid  line  connection;  "PM2  the  dashed  line 
connection;  and  "straight,"  the  dotted  line  connection.  Stages  of  the 
network  are  numbered  from  n-1  to  0.  The  data  output  from  cell  j  at 
stage  i  becomes  the  data  input  to  cell  k  at  stage  i-1  where  k  c  -C j -2 1 
modulo  N,  j,  j+21  modulo  N>.  Each  cell  is  controlled  independently  of 
any  other  cell. 

The  ADM  network  is  based  on  Feng's  data  manipulator  [FED.  The  data 
manipulator  is  also  based  on  the  PM2I  functions  and  consists  of  n+1 
columns  of  N  cells.  There  are  again  three  connections  from  an  input 

cell  j  at  stage  i,  namely  PM2+.,  PM2_.,  and  straight,  where  0  _<  j  <  N 

and  0  _<  i  <  n.  All  but  the  last  column  are  controlled  by  a  pair  of  sig- 

pi  pi 

nals  selected  from  a  group  of  six.  (PM2_.j),  D^  (PM2+),  and 

H1  (straight)  control  those  input  cells  at  stage  i  whose  ith  address  bit 

2  ^  p  *  2  ^ 

is  0.  The  signals  U2  (PM2_.j),  D^  (PM2+i),  and  (straight)  control 

those  cells  whose  ith  address  bit  is  1.  Thus,  the  ADM  is  a  data 


b 


manipulator  network  with  individual  cell  control. 

In  an  SIMD  environment,  the  network  configuration  established  in 
the  Generalized  Cube  or  ADM  network  would  depend  on  the  permutation  of 
network  inputs  to  outputs  desired.  As  an  example,  for  the  permutation 
which  maps  any  input  x  to  (x+3)  modulo  N,  0  £  x  <  N,  the  settings  for 
both  networks,  when  N  =  8,  are  shown  in  Figures  4.3  and  4.4.  Not  dll 
permutations  of  N  items  can  be  performed  by  these  networks  in  one  pass 
through  the  network.  However,  the  permutation  capability  of  the  ADM 
network  is  known  to  be  a  superset  of  that  of  the  Generalized  Cube 
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CHAPTER  5 

COUNTING  GENERALIZED  CUBE  PERMUTATIONS 

The  N-input  Generalized  Cube  network  has  Nn/2  interchange  boxes. 
For  permuting  data,  each  interchange  box  can  be  individually  set  to  one 
of  two  states,  either  straight  or  exchange  (see  Figure  4.1).  Thus, 
there  are  2Nn^  different  ways  to  set  the  Nn/2  interchange  boxes.  It  is 
clear  from  the  structure  of  the  network  that  every  possible  setting  will 
result  in  a  one  to  one  mapping  of  inputs  to  outputs,  i.e.  a  permutation, 
since  each  interchange  box  perforins  one  to  one  connections. 

The  following  theorem  is  needed  to  show  that  a  one-to-one 
correspondence  exists  between  network  settings  and  permutations  for  the 
Generalized  Cube. 

Theorem  There  is  one  and  only  one  route  between  any  source  and  des¬ 
tination  for  the  Generalized  Cube  network. 

Proof:  Consider  an  arbitrary  source,  S  =  <s  ^..Sg),  and  a  destination, 
0  =  (dn_i . . ,dg) .  For  a  route  connecting  S  to  D  to  exist,  the  cube^  in¬ 
terconnection  functions,  0  _<  i  <  n,  which  are  implemented  by  the  physi¬ 
cal  network  hardware  must  be  able  to  map  S  to  D.  In  each  stage  of  the 
network  there  is  exactly  one  interchange  box  with  an  input  labelled  by 
some  given  address.  Thus  S  can  be  mapped  to  D  if  first  in  stage  n-1  the 
interchange  box  with  S  as  an  input  is  set  to  straight  if  sn_^  =  dn_^  or 
set  to  exchange  if  sn_^  #  ^n-1 '  ^he  straight  connection  maps 
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(sn_^...Sg)  to  <sn_-j...Sg>  =  <dn-i  sn-2‘  •  *sq)  •  The  exchange  connection 

performs  cube„  .  mapping  (s^  ,...sn)  to  (s  ....sn)  =  (d  - s  ->...sn). 

n- I  n-i  u  n-1  U  n-1  n-c  U 

This  procedure  can  be  repeated  for  stage  n-2,  setting  the  interchange 

box  with  (dn_^ sn_2* • *sq^  as  an  input  to  the  correct  state  to  map 

( d  .  s  . •  ■  •  sn)  to  ( d  .d  •.  s  7 •  •  •  Sa)  ■ 
n-1  n-2  0  n-1  n-2  n-3  0 

The  procedure  can  be  continued  for  stages  n-3  through  0  mapping  an 
arbitrary  source  to  any  destination.  The  procedure  is  deterministic  and 
there  is  only  one  valid  choice  of  interchange  box  state  for  each  stage, 
so  there  is  only  one  route  between  a  source  and  destination. 

CD 


Now  consider  two  distinct  network  settings.  There  must  be  at  least 
one  interchange  box  which  is  set  straight  in  one  of  the  settings,  and 
exchange  in  the  other.  Pick  a  source,  S,  which  is  mapped  to  its  desti¬ 
nation,  D,  through  this  particular  interchange  box  for  one  of  the  set¬ 
tings.  There  is  only  one  path  through  the  network  between  any 
source/destination  pair.  Thus,  using  the  other  setting  does  not  allow  S 
to  map  to  D,  giving  a  distinct  permutation.  A  permutation  is  said  to  be 
passable  by  an  interconnection  network  if  the  physical  network  structure 
(i.e.,  interchange  boxes,  for  the  Generalized  Cube)  allows  the  connec¬ 
tions  to  be  made.  Therefore,  each  distinct  setting  results  in  a  dis¬ 
tinct  permutation  giving  a  total  of  2^n^  permutations  passable  by  the 
Generalized  Cube  (and  its  equivalents  CSS1D). 

This  permutation  count  for  the  Generalized  Cube  network  is  rela¬ 
tively  straightforward.  It  is  included  here  for  later  comparison  to  the 
number  of  permutations  performable  by  the  ADM  network. 


CHAPTER  6 


COUNTING  AUGMENTED  DATA  MANIPULATOR  PERMUTATIONS 

6_.1_  Introduction 

Unlike  the  case  of  the  Generalized  Cube  network,  the  question  of 
the  number  of  distinct  permutations  passable  by  the  ADM  network  does  not 
yield  to  a  straightforward  consideration  of  all  possible  network  states. 
There  are  two  reasons  for  this  difficulty.  One  is  the  fact  that  in  the 
ADM,  unlike  the  Generalized  Cube,  an  arbitrary  network  setting  ,nay  not 
result  in  a  permutation  of  network  inputs  to  outputs.  Figure  6.1  shows 
an  example  of  this.  When  the  two  routes  of  two  different 
source/destination  pairs  have  any  links  in  common  a  collision  is  said  to 
exist.  Data  passing  through  the  network  can  be  lost  in  this  situation. 
In  Figure  6.1  each  cell  is  performing  an  allowable  switch  setting.  How¬ 
ever,  in  stage  1  both  cells  1  and  4  connect  to  cell  1  and  in  stage  0 

cell  5  connects  to  both  cells  4  and  5.  If  the  network  setting  is  f, 

then  f(1)  =  f(4)  =  1  and  f(5)  =  4  or  5.  Clearly,  f  is  not  a  permuta¬ 
tion.  The  second  reason  is  that  for  certain  permutations  more  than  one 

valid  network  setting  exists.  Figure  6.2  shows  two  settings  which  are 
equivalent.  In  each  case  the  same  permutation  of  network  inputs  to  out¬ 
puts  is  performed,  that  is  0  to  3,  1  to  6,  2  to  5,  3  to  2,  4  to  7,  5  to 
4,  6  to  1,  and  7  to  0. 

For  the  remainder  of  the  discussion,  ADM  network  performable  permu¬ 
tations  are  referred  to  as  overall  permutat ions.  Conf igurations  of 
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Figure  6.1  Example  of  a  network  setting  for  N  -  8  which 
correspond  to  an  overall  permutation. 
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stage  j  of  the  network,  for  0  <  j  <  n,  which  are  permutations  of  stage  j 
inputs  to  outputs  are  called  stage  j  permutations. 

The  approach  to  counting  the  number  of  overall  permutations  will  be 
first  to  determine  what  type  of  network  settings  give  a  permutation  of 
inputs  to  outputs.  Next,  the  number  of  stage  0  permutations  is  counted 
and  the  result  generalized  to  any  stage.  Finally,  using  partitioning 
theory  and  the  results  concerning  network  stages,  the  network  is  treated 
as  two  subnetworks  connected  to  stage  0,  and  upper  and  lower  bounds  on 
the  number  of  data  permutations  performable  by  the  entire  ADM  network 
are  established. 


6_.2_  Stage  Permutations 

Before  the  number  of  overall  permutations  performable  by  the  ADM 
network  can  be  counted,  the  two  difficulties  described  in  the  previous 
section  must  be  addressed.  The  following  answers  the  first  difficulty, 
that  of  determining  which  type  of  network  settings  correspond  to  permu¬ 
tations  . 

Lemma  6.1 ;  An  ADM  network  configuration  is  an  overall  permutation  if  and 
only  if  it  consists  of  stage  i  permutations  for  all  i,  0  <  i  <  n. 

Proof:  Assume  a  given  network  conf iguration  is  an  overall  permutation. 
For  this  to  be  true  there  can  be  no  conflict  of  data  at  any  cell  in  the 
network,  i.e.,  no  cell  can  receive  data  from  more  than  one  cell  in  the 
previous  stage.  Because  each  stage  has  the  same  number  of  cells,  no 
cell  can  fail  to  receive  data,  without  conflict  or  loss  of  data  in  that 
stage.  Thus,  if  the  network  configuration  is  an  overall  permutation 
then  each  stage  i  configuration  must  be  a  permutation  for  all  i. 
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0  £  i  <  n. 

Assume  that  each  stage  i  configuration  is  a  permutation  for  all  i, 
0  _<  i  <  n.  Because  of  this  constraint  on  the  stage  configurations,  no 
conflict  can  exist  in  the  network.  This  implies  that  the  network  confi¬ 
guration  is  an  overall  permutation. 

□ 

This  lemma,  while  obvious,  is  presented  because  it  establishes  the  cri¬ 
teria  for  permutation  passability  in  the  ADM  network  network,  which  is 
central  to  the  development  that  follows.  A  permutation  is  passable  by 
the  ADM  network  if  and  only  if  a  set  of  N  routes  exist  which  perform  the 
desired  mapping  without  conflict. 

To  deal  with  the  second  difficulty,  that  of  generating  overall  per¬ 
mutations  with  more  than  one  network  setting,  a  divide  and  conquer  ap¬ 
proach  will  be  used.  This  will  limit  the  need  to  check  for  setting 
redundancy  to  stage  0  of  the  network.  First,  the  configurations  of 
stage  0  are  investigated. 

Stage  0  is  the  only  stage  which  can  affect  the  low  order  bit  of  a 
source  address,  causing  mapping  to  a  destination  with  a  low  order  bit 

that  is  either  the  same  or  different  from  that  of  the  source.  Let 

S  =  (sn_1sn_2...s^Sg)  be  a  source,  and  D  =  (dn_^ dn_£. . .d^ dg)  its  desti¬ 
nation.  A  connection  in  stage  0  that  does  not  affect  the  low  order  bit 
of  the  destination  address,  i.e.,  Sg  =  dg,  is  called  a  strai ght 
connection.  A  connection  that  changes  the  low  order  bit,  Sg  =  dg,  is 
called  an  exchange.  This  is  shown  in  Figure  6.3.  A  regular  exchange  is 

between  stage  0  cells  (p  ,...p,0)  and  (p  ,...p,0  +  2°)  modulo  N.  An 

n-i  i  n-i  i 

irregular  exchange  is  between  stage  0  cells  (Pn_^ _ p^Q)  and 


c)  Irregular 
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(pn_1 . . .p^O  -  2^)  modulo  N.  Because  a  permutation  is  one  to  one,  any 
possible  stage  0  permutation,  except  the  all. +2^  or  all  -2^  configura¬ 
tions,  consists  of  straight  and/or  exchange  connections  only  (i.e., 
every  +2^  or  -2®  connection  is  part  of  an  exchange)  CSI63.  The  all  +2^ 
and  all  -2^*  connections  form  permutations  because  every  cell  uses  +2^  or 
every  one  uses  -2^*  (modulo  N  arithmetic). 

Consider  the  stage  0  permutations  other  than  the  all  +2^  or  all 

0 

-2  .  They  can  be  represented  by  an  N-bit  binary  number,  called  the 
characteristic  binary  number.  A  binary  digit  is  associated  with  each 
adjacent  pair  of  cells,  including  a  digit  for  the  wrap-around  pairing  of 
the  cells  labeled  0  and  N-1 .  If  the  adjacent  pair  of  cells  together 
form  an  exchange  connection,  the  characteristic  binary  digit  is  1.  If 
not,  the  digit  is  0.  An  example  of  this  assignment  is  shown  in  Figure 
6.4. 

In  order  to  use  the  characteristic  binary  numbers  for  counting 
stage  0  permutations,  two  kinds  of  digit  adjacency  are  distinguished. 
When  the  first  and  last  bits  of  the  characteristic  binary  numbers  are 
not  considered  adjacent  it  is  linear  adjacency.  When  the  first  and  last 
bits  are  considered  adjacent  it  is  circular  adjacency. 

Lemma  6.2:  Every  stage  0  permutation,  except  the  settings  all  +2^  or  all 
0 

-2  ,  has  a  unique  characteristic  binary  number  with  no  circularly  adja¬ 
cent  bits  that  are  both  Is. 
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Figure  6. A 


A  stage  0  permutation  and  its  characteristic 
for  N  =  8. 


binary 


number 
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Proof:  Every  stage  0  permutation,  except  the  all  +2^  and  all  -2®  confi¬ 
gurations,  can  be  formed  from  straight  and  exchange  connections  L S 1 63 . 
If  the  characteristic  binary  number  of  a  configuration  has  circularly 
adjacent  Is,  then  there  is  a  cell  involved  in  two  exchanges  such  that 

pn-1Pn-2’ ’  *P1PQ  +  (pn-1pn-2*”p1p0+z0)  moduLo  N 

and 

Pn-1pn-2-*plP0  *  (pn-1pn-2*--p1p0“20)  moduto  N 

This  is  shown  in  Figure  6.5.  This  mapping  is  not  one-to-one,  hence  the 
configuration  is  not  a  permutation.  If  the  associated  binary  number  has 
no  circularly  adjacent  Is,  then  every  stage  0  input  can  be  involved  in 
at  most  one  exchange.  Since  every  input  is  involved  in  either  a 
straight  or  an  exchange  connection,  the  configuration  will  be  one-to- 
one,  and  hence  a  permutation. 

D 

The  characteristic  binary  numbers  of  stage  0  permutations  can  be 
used  to  count  the  number  of  these  permutations.  Lemmas  6.3  and  6.4  are 
based  upon  C0S3. 

Lemma  6.3:  The  number  of  N-bit  binary  numbers  with  no  linearly  adjacent 
Is  is  found  using  the  recursive  relationship 


L (N)  =  L (N-1 )  ♦  L (N-2) 


Figure  6.5  Configuration  implied  by  two  adjacent  Is  in  the  characteris 
tic  binary  number.  This  is  not  a  permutation. 
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Proof:  If  an  N-bit  number  ends  in  a  0,  then  it  will  have  no  linearly  ad¬ 
jacent  Is  if  it  has  no  linearty  adjacent  Is  in  the  first  N-1  bits.  The 
number  of  all  such  N-bit  numbers  is  L(N-1).  If  an  N-bit  number  ends  in 
1,  then  the  immediately  preceding  bit  must  be  a  0  if  the  number  is  to 
have  no  linearly  adjacent  Is.  Also,  the  first  N-2  bits  of  the  number 
must  have  no  linearly  adjacent  Is.  The  number  of  all  such  N-oit  numbers 
is  LCN-2) .  Thus,  L(N)  =  L(N-1)  +  L(N-2). 

The  initial  conditions  may  be  derived  by  noting  that  L(2)  =  3 
(i.e.:  00,01,10)  and  L(3)  =  5  (i.e.:  000,100,010,001,101). 

□ 

Lemma  6.4:  The  number  of  N-bit  binary  numbers  with  no  circularly  adja¬ 
cent  Is  is 

C (N)  =  L(N)  -  L (N-4) 

where  N  ^  8. 

Proof:  L(N)  exceeds  C(N)  by  the  number  of  N-bit  numbers  with  no  linearly 
adjacent  Is  which  do  have  circularly  adjacent  Is.  These  numbers  are  all 
of  the  form 

1  0  a1  a2  ...  aN-4  0  1 

where  the  number  a^a^-.-d^^  is  a  binary  number  with  no  linearly  adja¬ 
cent  Is.  There  are  LCN-4)  such  numbers.  Thus,  C(N)  =  L(N)  -  LCN-4). 

CD 
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Theorem  6^.1_:  For  an  N-input  ADM  network,  the  number  of  stage  0  permuta¬ 
tions  is 


PQ(N)  =  C (N)  +  2 
where  N  >_  8.  Also,  Pg(2)  =  2  and  Pg(4>  =  9. 

Proof :  By  Lemma  6.2,  Pg(N)  will  be  equal  to  the  number  of  characteristic 
binary  numbers  with  no  circularly  adjacent  Is,  plus  the  two  cases  all 
+2^  and  all  -2^.  Pq C2>  can  be  counted  by  direct  enumeration.  Pq(4)  can 
be  counted  either  by  direct  enumeration  or  by  noting  that 
C (4)  =  l(4)  -  1,  the  "-I"  being  for  the  case  1001. 

CD 

The  method  used  to  count  the  number  of  stage  0  permutations  can  be 
applied  to  any  stage  of  the  ADM  network. 

Theorem  6^:  For  an  N-input  ADM  network,  the  number  of  stage  i  permuta¬ 
tions  is 

•  ?i 

P.  (N)  =  P_ (N/21 D 

i  0 

where  N  >_  2,  and  0  <_  i  <  n. 

Proof:  To  count  the  number  of  ways  in  which  stage  i  can  permute  data, 
consider  the  set  of  cells  S  =  C j , j+21 , j+2‘21 , . . . j+(2n  1 -1 ) *2 1 >  for  a 
fixed  j,  0  £  j  <  21.  In  stage  i  an  arbitrary  cell,  k,  can  be  mapped  to 

any  of  CCk-21)  modulo  N,  k,  (k+21)  modulo  N>.  That  is,  if  k  e  S,  for 

any  fixed  j,  0  £  j  <  21,  then  (k-21)  modulo  N  e  S  and 

C k+2 1 )  modulo  N  e  S.  Since  successive  elements  of  S  differ  by  21,  this 
mapping  of  k  is  completely  analogous  to  that  of  k  mapping  to 


I 


where 


k  e  S 


•C(k  -2^)  modulo  N,  k  ,  (k  +2^)  modulo  N> 

C0,1,...,2n  ^-1>.  This  second  mapping  is  that  of  stage  0  in  an  ADM  net¬ 
work  with  2n  1  inputs.  Thus  the  cells  of  S  can  perform  P,-j (2n— ^  = 
PglN/21)  permutations.  There  are  21  possible  values  of  j,  each  defining 
a  set  such  that  S'5  H  Sk  =  0  for  k  t  j  and  0  <  j,k  <  2i .  That  is,  the 
21  sets  are  disjoint,  so  the  permutations  performable  on  the  cells  of  SJ 

are  independent  of  those  performable  on  S  ,  for  k  t  j.  Thus 
• 

P . (N)  =  P  (N/21 )  . 


<6.3^  Network  Permutations 

The  remaining  obstacle  to  counting  the  number  of  distinct  ADM  per¬ 
formable  overall  permutations  is  determining  what  class  of  permutations 
have  more  than  one  network  setting.  If  the  network  is  smalL  this  task 
can  be  avoided.  For  an  ADM  network  where  N  =  4,  the  number  of  perform¬ 
able  permutations  can  be  counted  by  direct  enumeration. 

Lemma  6.5:  For  N  -  4,  the  ADM  network  can  perform  all  possible  N!  =  24 
permutations. 

Proof:  By  direct  enumeration  (see  CSS33). 

□ 

For  N  >  4,  direct  enumeration  is  not  a  practical  alternative  for 
counting  the  number  of  overall  permutations.  Consider  conceptually 
separating  stage  0  from  the  rest  of  the  network.  This  is  shown  in  Fig¬ 
ure  6.6.  Stages  n-1  through  1  can  be  partitioned  into  two  independent 
subnetworks,  each  with  N/2  inputs  CS 16, SSI □ .  All  the  odd-numbered  cells 


STAGE  2  1  0 


Figure  6.6  The  first  step  in  the  conceptual  process  of  partitioning  an 
ADM  network  for  N  =  8  into  two  independent  subnetworks 
joined  at  stage  0.  Each  cell  that  interfaces  stage  1  and  0 
is  shown  divided  into  an  output  cell  from  stage  1  and  an  in¬ 
put  cell  to  stage  0. 
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in  stages  n-1  through  1  will  constitute  one  of  the  subnetworks.  This  is 
the  odd  subnetwork.  ALL  the  even-numbered  ceLLs  in  stages  n-1  through  1 
constitute  the  other  subnetwork,  the  even  subnetwork.  The  reLationship 
of  these  two  subnetworks  to  the  finaL  stage  of  the  N-input  ADM  network, 
stage  0,  is  shown  in  Figure  6.7. 

The  partitioning  described  connects  the  outputs  of  the  even  subnet¬ 
work  to  all  even-numbered  inputs  of  stage  0.  The  outputs  of  the  odd 
subnetwork  are  connected  to  the  odd-numbered  inputs  of  stage  0.  Parti¬ 
tioning  the  ADM  network  aLLows  an  N-input  network  to  be  treated  as  two 
N/2-input  independent  ADM  networks  combined  at  stage  0  of  the  N-input 
network. 

Lemma  6^6:  The  four  stage  0  permutations  aLL  reguLar  exchanges,  aLL  ir¬ 
regular  exchanges,  aLL  +2^,  and  aLL  -2^  connect  aLL  even  subnetwork  out¬ 
puts  to  odd  numbered  network  outputs  and  aLL  odd  subnetwork  outputs  to 
even  numbered  network  outputs.  Furthermore,  no  other  stage  0  permuta¬ 
tion  does  this. 

Proof:  The  four  named  permutations  each  connect  aLL  even  subnetwork  out¬ 
puts  to  odd  network  outputs  and  aLL  odd  subnetwork  outputs  to  even  net¬ 
work  outputs  because  each  forces  dg  =  Sg  for  all  source/destination 
pairs.  This  is  shown  in  Figure  6.8  for  N  =  8.  A  permutation  not  of  the 
named  set  must  have  a  straight  connection.  If  an  output,  D,  is  connect¬ 
ed  to  a  straight  stage  0  Link,  then  dg  =  Sg.  Thus,  the  four  named  stage 
0  permutations  are  the  only  ones  with  this  property. 

CD 
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Figure  6.7  Cells  from  stages  2  and  1  of  an  ADM  network  for  N  =  3  rear¬ 
ranged  into  the  two  independent  subnetworks,  each  with  N/2 
inputs.  E  and  0  designate  even  and  odd  subnetwork,  respec- 


STAGE  0  STAGE  0 

(c)  (d) 


Figure  6.8  Illustration  of  the  four  stage  0  permutations  for  N  =  8 
which  connect  all  even  subnetwork  outputs  to  odd  network 
outputs,  and  odd  subnetwork  outputs  to  even  network  outputs. 
They  are  (a)  all  regular  exchanges,  (b)  all  irregular  ex¬ 
changes,  (c)  all  +2^,  and  (d)  all  -2^.  Even  and  odd  source 


subnetworks  are  indicated  by  E  and  0,  respectively. 
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Consider  an  arbitrary  destination,  D,  of  an  N-input  ADM  network. 
The  source  for  the  data  arriving  at  D  may  have  been  either  the  even  sub¬ 
network  or  the  odd  subnetwork  (see  Figure  6.7)  depending  on  the  stage  0 
configuration.  Call  the  subnetwork  which  is  the  source  of  D  the  source 
subnetwork. 

Lemma  6.7;  Consider  the  set  of  all  stage  0  permutations  except  all  regu¬ 
lar  exchanges,  all  irregular  exchanges,  all  +2^,  and  all  ~2^.  For  each 
of  these  permutations  the  set  of  pairings  of  source  subnetwork  with  net¬ 
work  output  is  unique. 

Proof;  Proof  by  contradiction.  Assume  that  two  distinct  stage  0  permu¬ 
tations  of  this  set  both  link  the  same  source  subnetwork  to  a  given  net¬ 
work  output,  and  that  this  is  true  for  any  network  output.  That  is,  the 
two  permutations  have  identical  sets  of  pairings.  A  permutation  of  the 
named  set  must  have  a  straight  connection.  If  an  output,  D,  is  connect¬ 
ed  to  a  straight  stage  0  link,  then  d^  =  s^.  If  it  is  connected  to  an 
exchange,  then  dg  =  Sg,  and  the  source  subnetwork  differs  from  the  pre¬ 
vious  case.  Thus  all  straight  connections  of  one  permutation  must  be 
duplicated  in  the  other,  and  vice  versa,  if  the  source  subnetworks  are 
to  be  the  same  for  each  output. 

A  circularly  adjacent  pair  of  Os  in  a  characteri st i c  binary  number 
corresponds  to  a  straight  connection.  Specifying  all  straight  connec¬ 
tions  thus  specifies  all  circularly  adjacent  Os  in  the  characteristic 
binary  numbers  of  both  permutations.  The  remaining  bits  of  the  numbers 
must  contain  no  circularly  adjacent  Os.  Recall  that  no  circularly  adja¬ 
cent  Is  may  appear  since  this  is  a  permutation.  Thus,  the  first  and 
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Last  bits  of  any  contiguous  unspecified  bit  positions  must  be  Is  and  the 
interior  bits  must  alternate  Is  and  Os.  Single  unspecified  bit  posi¬ 
tions  must  become  Is.  Each  unspecified  bit  position  is  thus  assigned  a 
unique  value.  Therefore,  both  numbers  are  identical.  The  permutations 
cannot  be  distinct. 

LI 

Let  P(N)  be  the  number  of  distinct  overall  permutations  performable 
by  an  N-input  ADM  network.  And  let  P^lN)  and  P^(N)  be  upper  and  lower 
bounds  on  P(N),  respectively. 

Theorem  6.3:  A  lower  bound  on  the  number  of  distinct  overall  permuta¬ 
tions  performable  by  the  N-input  ADM  network  is 

Pl<N)  =  Pl(N/2)2  *  CPq(N)-3] 
where  PL<4)  =  P(4)  =  24;  N  8. 

Proof:  As  a  result  of  Lemma  6.1  only  configurations  which  are  permuta¬ 
tions  at  every  stage  need  be  considered  in  the  following.  The  number  of 
permutations  performable  by  one  of  the  two  independent  subnetworks  which 
can  be  formed  by  partitioning  the  ADM  is,  by  definition,  at  least 
Pl<N/2).  Call  the  permutations  available  at  the  inputs  of  stage  0  the 
input  permutations.  Because  the  two  subnetworks  of  the  partition  are 
independent,  the  number  of  distinct  input  permutations  is  at  least 
Pl(N/2)2. 

Now,  consider  an  arbitrary  overall  permutation.  Assume  that  the 
stage  0  permutation  is  fixed.  Any  change  in  the  input  permutation  will 
result  in  a  change  in  the  overall  permutation.  This  is  because 
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(a*c  =  b  *c)  implies  a  =  b  where  a,  b,  and  c  are  permutations  and  •  is 
composition.  Assume  the  input  permutation  and  the  stage  0  permutation 
are  both  allowed  to  change.  Let  the  stage  0  permutation  be  restricted 
so  that  only  one  of  the  permutations  all  regular  exchanges,  all  irregu¬ 
lar  exchanges,  all  +2^,  or  all  -2^  is  allowed;  there  are  Pg(N)-3  of 
these.  As  a  consequence  of  Lemma  6.6  and  Lemma  6.7,  any  change  in  the 
stage  0  permutation  will  cause  at  least  one  output  address  to  be  mapped 
from  a  different  subnetwork.  But  because  the  two  subnetworks  are  in¬ 
dependent,  no  change  of  the  input  permutation  can  result  in  the  mapping 
of  a  source  of  one  subnetwork  to  the  output  of  the  other.  The  resulting 
overall  permutation  cannot  be  the  same  as  the  original  no  matter  how 
stage  0  and  the  subnetworks  are  manipulated. 

Thus,  no  overall  permutation  can  be  duplicated  by  changing  the  in¬ 
put  permutation  and/or  changing  the  stage  0  permutation  (provided  the 
stage  0  permutation  is  not  one  of  the  three  excluded).  Hence,  each  com¬ 
position  of  input  permutation  with  allowable  stage  0  permutation  (i.e., 
not  one  of  the  three  excluded)  results  in  a  unique  overall  permutation. 
So,  the  number  of  input  permutations  is  multiplied  by  the  number  of 
stage  0  permutations,  minus  the  three  special  cases,  to  yield  the  lower 
bound  given  on  the  number  of  performable  overall  permutations. 

The  boundary  condition  was  stated  in  Lemma  6.5. 
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Theorem  6.4:  An  upper  bound  on  the  number  of  distinct  overall  permuta¬ 
tions  performable  by  the  N-input  ADM  network  is 

PU<N>  =  Pu<N/2)2  *  p  CM) 
where  P^U)  =  P (4)  =  24  and  N  ^  8. 

Proof :  Assuming  that  the  composition  of  any  input  permutation  with  any 
stage  0  permutation  (including  all  regular  exchanges,  all  irregular  ex¬ 
changes,  all  +2^,  and  all  -2^)  yields  a  unique  overall  permutation, 
gives  the  above  result. 

□ 

For  an  ADM  network  with  N  =  8,  an  exact  count  of  the  number  of  per¬ 
formable  permutations  can  be  derived. 

Theorem  6.5:  P(8)  =  P(4)2  *  Cpq(8)-33. 

Proof :  From  Lemma  6.6  and  Lemma  6.7  the  stage  0  permutations  all  regular 
exchanges,  all  irregular  exchanges,  all  +2^,  and  all  -2^*  are  the  only 
stage  0  permutations  which  share  a  common  set  of  pairings  of  source  sub¬ 
network  with  network  output.  Consider  a  particular  overall  permutation 
involving  a  stage  0  permutation  selected  from  this  set  of  four.  The 
same  overall  permutation  can  be  maintained  after  changing  stage  0  to 
another  of  the  given  set  of  four  stage  0  permutations  if  the  input  per¬ 
mutation  can  be  suitably  modified.  For  example.  Figure  6.2  shows  a 
given  overall  permutation  in  the  upper  network  which  uses  the  all  *2® 
setting  in  stage  0.  The  lower  network  shows  stage  0  set  to  all  regular 
exchanges  and  the  necessary  changes  in  the  settings  of  stages  2  and  1 
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made  so  that  the  same  overall  permutation  is  performed.  The  changes  in 
stages  2  and  1  settings  accomplish  the  needed  input  permutation  modifi¬ 
cation.  Since  the  choice  of  source  subnetwork  remains  unchanged  for  all 
outputs  after  resetting  stage  0,  the  necessary  changes  in  the  input  per¬ 
mutation  will  occur  only  within  the  two  independent  stage  0  subnetworks. 
For  N  =  8  these  subnetworks  are  themselves  4-input  ADM  networks  which 
can  perform  any  permutation  of  four  items  (Lemma  6.5).  Therefore,  any 
needed  modification  of  the  input  permutation  can  be  performed.  So,  the 
overall  permutations  performable  using  any  member  of  the  given  set  of 
four  stage  0  permutations  will  be  exactly  the  same  as  those  performable 
using  any  of  the  three  other  stage  0  permutations.  Thus  P(N)  is  equal 
to  the  lower  bound  given  in  Theorem  6.3. 

D 

Because  the  ADM  network  with  N  =  8  cannot  perform  all  N!  =  40,320 
permutations  (P(8)  =  24,496),  the  method  of  Theorem  6.5  does  not  extend 
directly  to  Larger  values  of  N. 

Corollary  6^1_:  The  number  of  distinct  permutations  performable  by  the 
N-input  ADM  is  bounded  by 

Pl(N/2)2  *  CPqCN>-33  <  PCN)  <  P(J(N/2)2  *  PQ(N) 
where  N  >  8.  Also 

Pl<8)  =  Pj (8)  =  P(8)  =  26,496. 
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Proof;  This  corollary  follows  from  Theorems  6.3,  6.4,  and  6.5.  P(N)  is 
strictly  less  than  P^CN)  because  there  exist  overall  permutations  for 
which  multiple  distinct  input  permutation  and  stage  0  permutation  compo¬ 
sitions  result  in  the  same  overall  permutation.  For  example  an  overall 
permutation  of  input  i  to  output  (i+1)  modulo  N,  0  <  i  <  N,  can  be  done 
with  stage  0  set  to  all  +2^,  all  -2^,  or  all  regular  exchanges. 

□ 


6.4^  Tightness  and  Asymptoti c  Behavior  of  the  Bounds 
The  suitability  of  the  bounds  given  in  Corollary  6.1  as  a  measure 
of  ADM  network  performance  will  depend  on  how  tight  the  bounds  are  for 
various  values  of  N,  or  network  size.  The  less  tHe  difference  between 
the  upper  and  lower  bounds  the  more  useful  they  are  as  an  indicator  of 
ADM  network  performance.  The  tightness  of  the  bounds  stated  in  Corol¬ 
lary  6.1  can  be  calculated  as  a  function  of  the  number  of  inputs  to  the 
network.  Define  the  spread  of  the  bounds,  S(N),  to  be 


S  (N) 


Py(N)  -  Pl(N) 
Pl(N) 


Using  this  formula  and  the  results  of  Corollary  6.1  and  Theorem  6.5, 
Table  6.1  is  calculated.  The  number  of  permutations  performable  by  the 
Generalized  Cube  network  (see  Chapter  5)  is  included  in  Table  6.1  for 
comparison. 

Let  x  =  PylNg)  and  x+&  =  PylNg),  where  Ng  =  21  for  i  c  CO, 1,2,...}. 
Then  for  an  ADM  network  with  Ng  inputs  the  spread  is 
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S (2*Ng)  = 


Since  PQ(32)  *  4,870,349 
an  arbitrarily  large  ADM 
in  general 


(x+A)2  *  Pq(2*Nq)  -  x2*PQ(2*NQ> 
x2  *  P0C2*N0) 
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this  is  a  good  approximation  for  Ng  32.  For 
network  the  spread,  using  the  approximation,  is 
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where  N  =  2n*Ng,  and  i  e  {0,1,2, ...>.  As  network  size  increases  without 
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lim  S (N)  = 
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Thus  the  bound  becomes  a  less  precise  measure  of  the  capability  of  an 
ADM  network  as  network  size  increases.  However,  as  shown  in  Table  6.1, 
the  value  of  S(N)  is  small  for  networks  for  considerable  size.  So,  for 
practical  values  of  N,  the  bounds  given  in  Corollary  6.1  give  a  useful 
approximation  of  the  number  of  ADM  network  performable  overall  permuta¬ 
tions. 
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Table  6.1  P^CN)  and  P^CN)  are  the  lower  and  upper  bounds  on  the  number 
of  permutations  performable  by  an  N-input  ADM  network, 
respectively.  SCN)  is  the  spread  of  the  bounds  calculated 
as  CPy(N)-PL(N)l/PL(N).  Cube  (N)  is  the  number  of  permuta¬ 
tions  performable  by  an  N-input  Generalized  Cube  network, 

.  ,Nn/2 

given  by  2 
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6^:5  Conclusions 

The  type  of  interconnection  network  chosen  for  an  SIMD  machine  will 
have  far  reaching  consequences  for  the  ultimate  system  performance  or 
lack  of  it.  Comparison  of  various  candidate  interconnection  networks 
can  involve  many  factors  such  as  cost,  partitionabi lity,  etc. 

This  chapter  has  considered  the  number  of  permutations  performable 
by  the  ADM  network.  For  the  ADM  network,  counting  the  number  of  per¬ 
formable  permutations  is  made  difficult  by  the  fact  that  the  network  has 
settings  which  do  not  yield  a  permutation.  Also,  the  network  has  multi¬ 
ple  settings  for  certain  passable  permutations. 

A  method  was  given  for  counting  the  number  of  stage  i  configura¬ 
tions  which  are  permutations,  for  any  size  ADM  network,  0  <  i  <  n.  Us¬ 
ing  partitioning  theory  in  a  divide  and  conquer  approach  led  to  an  upper 
and  lower  bound  on  the  number  of  distinct  overall  permutations  which  an 
ADM  network  can  perform.  To  assess  the  characteristics  of  the  bounds 
their  tightness  and  asymptotic  behavior  was  investigated.  For  the  spe¬ 
cial  case  N  =  8,  an  exact  count  of  the  number  of  distinct  overall  permu¬ 
tations  performable  was  proven.  Finally,  a  comparison  of  the  number  of 
distinct  permutations  performable  by  the  ADM  and  Generalized  Cube  net¬ 
works  was  made. 
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CHAPTER  7 

GENERALIZED  CUBE  PERFORMANCE  WITH  ROUTING  TAGS 

7.1_  Introduction 

Another  measure  of  the  utility  of  a  particular  interconnection  net¬ 
work  is  its  ability  to  operate  without  centralized  control.  For  SIMD 
machines  with  a  large  number  of  processing  elements,  centralized  control 
of  the  interconnection  network  may  cause  that  component  of  the  system  to 
become  a  bottleneck. 

One  way  to  distribute  control  of  the  interconnection  network  among 
the  N  PEs  is  to  use  routing  tags.  Each  PE  first  computes  a  routing  tag 
for  the  data  item  it  will  send  through  the  network.  Then  at  each 
switching  cell,  logic  circuits,  capable  of  using  the  information  of  the 
routing  tag  to  control  the  cell  setting,  select  an  appropriate  path  so 
the  data  item  will  reach  the  desired  destination.  With  this  scheme  the 
overhead  time  needed  to  establish  network  settings  is  independent  of 
network  size.  Therefore,  an  important  consideration  when  comparing  the 
relative  merits  of  various  interconnection  networks  is  the  nature  and 
capabilities  of  the  routing  tags  compatible  with  each  design. 

This  chapter  considers  the  operation  of  the  Generalized  Cube  net¬ 
work  using  routing  tags.  A  representative  tag  scheme  is  chosen  and  net¬ 
work  performance  studied.  The  results  obtained  are  used  for  comparison 


with  the  ADM  network. 
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7_. Routing  Tag  Operation 

There  is  only  one  route  between  any  source  and  destination  in  the 
Generalized  Cube  network  (see  Theorem  5.1).  Consider  a  routing  tag 
scheme  for  the  network  which  correctly  specifies  the  one  route  for  any 
source/destination  pair.  Such  a  routing  tag  scheme  will  generate  the 
correct  set  of  N  routes  for  any  permutation  which  is  passable  by  the 
Generalized  Cube.  Thus,  the  full  permuting  capabilities  of  the  network 
are  available  with  such  routing  tags. 

Several  routing  tag  schemes  exist  which  give  the  correct  route  for 

any  source/ destination  pair.  One  possibility  is  to  generate  a  tag,  T, 

according  to  T  =  (tn_^...tQ)  =  S  ©  D  CMCM3.  The  tag  is  interpreted  in 

the  following  way.  If  t.  =  0  then  the  interchange  box  with  the  input 

(d  . . . .d ... s . . . ,sn)  is  set  to  straight.  This  is  the  interchange  box  in 
n-1  i+1  i  0 

stage  i  through  which  S  is  mapped  to  D  (see  Theorem  5.1).  If  t.  =  1, 
the  interchange  box  is  set  to  exchange.  For  example  if  S  =  0101  and 
D  =  1001  then  T  =  1100  and  the  interchange  box  settings  are  exchange, 
exchange,  straight,  and  straight. 

This  routing  tag  scheme  uses  easily  computed  tags  of  n  bits.  Be¬ 
cause  the  exclusive-or  operation  is  commutative,  the  tap  mapping  D  to  S 
is  the  same  as  that  for  S  to  D.  This  allows  handshaking  to  be  performed 
easily,  if  desired. 

Other  routing  tag  schemes  are  possible.  In  CLA3,  a  2n  bit  routing 
tag  consisting  of  the  n-bit  source  and  destination  addresses  is 
described  which  is  also  suitable  for  use  with  the  Generalized  Cube  net¬ 
work.  With  these  tags  a  processing  element  receiving  data  can  compare 
its  address  with  that  given  in  the  tag  to  detect  network  errors.  Anoth- 
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er  scheme  allowing  certain  kinds  of  broadcasting  is  presented  in 

CMCM/WE3. 

7_.Z_  Conclusions 

The  Generalized  Cube  network  is  well  suited  for  distributed  control 
using  routing  tags.  Any  routing  tag  scheme  which  can  specify  the  route 
between  any  source/destination  pair  allows  unrestricted  use  of  the  per¬ 
muting  capabilities  of  the  network. 
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CHAPTER  8 

AUGMENTED  DATA  MANIPULATOR  PERFORMANCE  WITH  ROUTING  TAGS 

8^1_  Introduction 

The  importance  of  distributed  network  control  was  discussed  in  Sec¬ 
tion  7.1.  In  this  chapter  various  routing  tag  schemes  for  controlling 
the  ADM  network  are  reviewed.  These  schemes  were  presented  in 

CMS/SSMA3. 

One  family  of  routing  tags  discussed  does  not  allow  unrestricted 

operation  of  the  ADM  network.  However,  these  tags  have  the  compelling 

characteristic  of  easy  compatibility.  Thus,  more  precise  knowledge  of 
the  performance  of  these  tags  in  an  SIMD  environment  will  aid  in 

evaluating  their  feasibility,  as  well  as  that  of  the  ADM  network. 

8^  Rout i ng  Tag  Schemes 

The  routing  tag  schemes  discussed  in  this  chapter  are  defined  in 
CMSD.  To  characterize  an  arbitrary  path  in  the  ADM  network  a  f ul l 
routing  tag  is  required.  A  full  routing  tag  requires  2n  bits  and  is  of 
the  form  F  =  < f 2n— 1  *2n-2‘ '  *^1  ^0^  *  **  can  de*’ned  such  that  the  even 

numbered  bits  represent  the  magnitude  of  the  route  to  be  taken  within  a 
particular  stage  and  the  odd  numbered  bits  represent  the  sign.  That  is, 
if  f£i  =  0,  the  straight  link  in  stage  i  is  used  regardless  of  the  value 

of  f2i+1*  If  f2i  3  1  and  f2i+1  =  the  +?1  Linlc  ’S  used*  If  f2i  3  1 
and  ~  t*1e  “ ^  used.  Control  of  the  ADM  network  is 
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distributed  by  constructing  each  cell  with  sufficient  Logic  to  examine 
bits  ^21+1  an<^  ^2i  a  rout^n9  ta9  anc*  make  the  appropriate  Link  selec- 
tion.  For  example,  with  N  =  16,  and  if  the  source  is  5  and  the  destina¬ 
tion  is  12,  one  possible  value  for  F  is  10001110.  The  route  taken  is 
3  10 

+2  ,  straight,  -2  ,  +2  .  The  use  of  full  routing  tags  allows  the  ADM  to 
perform  any  passable  permutation.  However,  no  function  or  algorithm  of 
reasonable  complexity  is  known  which  will  give  a  set  of  N  non¬ 
conflicting  tags  for  all  permutation  passable  by  the  ADM  network.  So, 
less  flexible,  but  easily  computed,  routing  tag  schemes  have  been 
developed. 

If  all  the  sign  bits  in  a  full  routing  tag  are  the  same,  the  infor¬ 
mation  contained  in  those  bits  can  be  represented  by  one  bit  which  is 
the  sign  bit  for  the  whole  tag.  An  n+1  bit  routing  tag,  T,  can  be 
formed  by  computing  the  signed  magnitude  difference  between  the  destina¬ 
tion,  D,  and  the  source,  S,  such  that  T  =  (t  t  . ...t.t-)  = 

n  n-1  1  U 

D  -  S  =  (d  ....d_)  -  (s  ,...s_).  The  sign  bit  is  t  ,  where  t  =0  in¬ 
n-1  0  n-1  0  3  n  n 

dicates  positive  and  t  =1  indicates  negative.  Bits  t  ...t  equal 

n  n-1  0 

the  magnitude  or  absolute  value  of  D-S.  If  all  N  routing  tags  for  a 
permutation  are  calculated  in  this  way,  then  the  permutation  is  said  to 
be  routed  using  natural  permutation  routing  tags  CSM2],  A  natural  rout¬ 
ing  tag  consisting  of  only  straight  or  +21-type  connections  is  said  to 
be  positive  dominant.  A  routing  tag  with  only  straight  or  -21-type  con¬ 
nections  is  negative  dominant.  A  given  natural  routing  tag  must  be  ei¬ 
ther  positive  or  negative  dominant. 

To  execute  this  scheme  bits  t^  and  t-  are  examined  to  determine  the 
link  to  be  used  at  stage  i.  If  t,  =  0,  the  straight  link  is  used  re- 
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gardless  of  the  value  of  t  .  If  t.  =  1  and  t  =0,  the  +21  link  is 

n  i  n 

used,  and  if  t.  =  1  and  t  =  1,  the  -21  link  is  used.  With  N  =  16,  and 
if  the  source  and  destination  are  again  5  and  12,  respectively,  then 
T  =  12-5  =  00111.  The  route  taken  is  straight,  +2^,  +z\  +2^. 

Clearly,  for  any  arbitrary  source/destination  pair  there  exists  a 
corresponding  natural  routing  tag  and  a  tag-specified  connection  path 
through  the  ADM  network.  Also,  natural  tags  are  easily  computed.  These 
properties  indicate  that  natural  routing  tags  may  be  suitable  for  con¬ 
trol  of  the  ADM  network  in  an  MIMD  environment  CMS]. 

In  CMS]  it  is  also  shown  that  a  natural  routing  tag  and  its  two's 
complement  are  equivalent  in  that  they  both  route  the  same  source  to  the 
same  destination.  Letting  T'  be  the  two's  complement  of  T,  then  from 
the  previous  example  T'  =  11001.  Input  5  is  still  connected  to  output 
12  but  the  route  taken  is  -2^,  straight,  straight,  -2^. 

As  a  consequence  of  its  definition,  any  natural  routing  tag  must  be 
either  positive  or  negative  dominant  (this  is  determined  by  the  value  of 
tn>.  The  two's  complement  of  any  tag  must  have  the  opposite  of  the  sign 
bit  of  the  original  unless  T  is  zero,  in  which  case  T*  =  T.  However, 
the  all  zero  tag  is  both  positive  and  negative  dominant.  Thus  a  posi¬ 
tive  dominant  routing  tag  and  a  negative  dominant  routing  tag  must  exist 
for  any  source/destination  pair.  So  positive  dominant  and  negative  dom¬ 
inant  routing  tags  can  be  used  in  an  MIMD  environment. 

In  an  SIMD  environment,  however,  the  desirable  characteri st i c s  of  a 
routing  tag  scheme  are  more  restrictive.  For  good  utility  routing  tags 
should,  in  addition  to  requiring  minimal  computation  for  their  genera¬ 
tion,  enable  passage  of  needed  overall  permutations.  That  is,  ideally. 
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N  non-conflicting  paths  should  be  specified  when  a  network  performable 
permutation  is  needed.  If  alt  N  tags  are  calculated  as  natural  routing 
tags,  then  the  permutation  is  said  to  be  routed  using  natural 
permutation  routing  tags.  A  permutation  is  said  to  be  routed  using 
positive  dominant  permutation  routing  tags  if  those  tags  that  are  nega¬ 
tive  dominant  in  the  set  of  natural  permutation  routing  tags  are  con¬ 
verted  to  positive  dominant.  Negative  dominant  permutation  routing  tags 
are  defined  correspondingly. 

For  most  cases  the  ADM  network  provides  two  or  more  distinct  routes 
between  a  source  and  its  destination.  Only  when  a  source  and  destina¬ 
tion  have  the  same  address  is  there  only  one  route  (for  a  proof  of  this 
see  Lemma  10.1).  The  n+1  bit  routing  tag  scheme  described  above  can 
specify  at  most  two  distinct  routes  between  a  source/destination  pair  - 
one  positive  dominant  and  one  negative  dominant.  If  more  routes  exist 
they  cannot  be  exploited  with  this  scheme.  However,  the  ease  of  gen¬ 
erating  these  tags  may  make  them  useful  in  many  instances,  despite  their 
l imitations. 

jj.]5  The  Number  of  Permutations  Passable  Using  Positive  or  Negative 

Dominant  Tags 

In  Chapter  6  partitioning  theory  was  applied  to  the  ADM  network  at 
stage  0  to  give  two  independent  subnetworks  on  stages  n— 1  through  1  (see 
Figures  6.6  and  6.7).  That  is  the  output  cells  of  these  subnetworks  are 
stage  1  output  cells.  The  subnetworks  so  created  have  the  structure  of 
N/2-input  ADM  networks.  Because  of  this  each  of  the  subnetworks  may  in 
turn  be  partitioned  in  the  same  manner  as  the  whole  network.  This  can 
be  seen  as  partitioning  the  entire  network  at  stage  1.  Two  independent 


subnetworks  wi U  be  created  on  stages  n-1  through  2  from  each  of  the 
subnetworks  on  stages  n-1  through  1.  This  gives  four  independent  sub¬ 
networks  whose  output  cells  are  stage  2  output  cells  and  which  each  have 
the  structure  of  an  N/4-input  ADM  network. 

This  process  may  be  repeated  until  N/2  independent  subnetworks 
whose  output  cells  are  stage  n-1  output  cells  are  generated.  These  sub¬ 
networks  have  the  structure  of  a  2-input  ADM  network.  At  each  stage  i 
there  are  21  independent  subnetworks  created  by  this  process.  Summing 
over  all  stages  gives  the  total  number  of  subnetworks  generated  as 


=  N-1. 


Note  that  by  including  the  term  for  i  =  0  in  the  summation,  the  entire 
network  has  been  considered  a  subnetwork  of  itself. 

Since  each  subnetwork  has  the  structure  of  an  ADM  network  then  it 
has  the  equivalent  of  a  stage  0.  This  stage  0  will  consist  of  all  stage 
i  input  and  output  cells  and  their  straight,  PM2+^,  and  PH2_.  links  con¬ 
tained  in  a  given  subnetwork  which  is  created  by  the  partitioning  pro¬ 
cess  at  stage  i.  This  set  of  cells  and  links  is  called  the  subnetwork 
stage  0. 

The  number  of  overall  permutations  passable  by  the  ADM  network  us¬ 
ing  positive  dominant  permutation  routing  tags  can  now  be  counted. 


N~1 

Theorem  8.1:  The  ADM  network  can  pass  2  distinct  overall  permutations 
using  positive  dominant  permutation  routing  tags. 

Proof:  Consider  stage  0  of  the  entire  network.  Any  exchange  connection 
involves  the  use  of  +2^*  and  -2^*  links.  The  -2^  link  cannot  be  used  by 
positive  dominant  tags,  so  no  exchanges  can  be  performed  in  stage  0  when 
positive  dominant  tags  are  used.  For  the  same  reason,  the  all  -2^*  set¬ 
ting  cannot  be  performed.  This  leaves  only  the  3ll  straight  and  all  +2® 
settings,  which  can  be  performed  with  positive  dominant  tags. 

For  each  of  the  subnetworks  the  same  reasoning  applies.  The  sub¬ 
network  stage  0  for  each  subnetwork  may  be  set  only  to  all  straight  or 
all  +21.  However,  each  subnetwork  stage  0  may  be  set  independently  be¬ 
cause  the  subnetworks  are  independent  (this  includes  stage  0  of  the  en- 

N  *1 

tire  network).  Since  there  are  N-1  subnetworks  there  are  2  distinct 
settings  of  the  entire  network. 

Since  the  setting  of  each  subnetwork  stage  0  is  a  permutation,  the 
setting  for  the  entire  network  performs  a  permutation.  Consider  again 
stage  0  of  the  entire  network.  The  all  straight  and  all  +2^  settings  do 
not  have  the  same  pairings  of  source  subnetwork  with  network  outputs 
(see  Lemmas  6.6  and  6.7).  The  concept  of  source  subnetwork  can  be  ex¬ 
tended  naturally  to  each  of  the  subnetworks  created  by  the  partitioning 
process  by  recalling  that  any  subnetwork  has  the  structure  of  an  ADM 
network  with  the  appropriate  number  of  inputs.  So  in  an  analogous 
manner,  the  all  straight  and  all  +21  settings  on  each  subnetwork  stage  0 
will  not  have  the  same  pairings  of  source  subnetwork  with  subnetwork 


outputs. 
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Thus,  any  change  in  a  subnetwork  stage  0  setting  will  cause  at 
least  one  subnetwork  output  cell  to  be  mapped  from  a  different  source 

subnetwork.  The  source  subnetworks  for  a  given  subnetwork  are  indepen¬ 

dent.  So  changing  a  subnetwork  stage  0  setting  must  give  a  different 

N-1  N-1 

permutation.  Thus  the  2  settings  correspond  to  2  distinct  pass¬ 
able  overall  permutations. 

CD 

N-1 

Corollary  8.1:  The  ADM  can  pass  2  distinct  overall  permutations  using 
negative  dominant  permutation  routing  tags. 

Proof:  The  proof  is  the  same  as  for  Theorem  8.1  except  that  the  possible 

subnetwork  stage  0  settings  are  all  straight  or  all  -21 . 

CD 

When  natural  permutation  routing  tags  are  used  additional  permuta¬ 
tions  can  be  performed  on  the  ADM  network.  The  perfect  shuffle  is  an 
example  of  one  such  permutation  CMAS3. 

The  number  of  permutation  performable  using  natural  tags  can  be 
bounded  by  considering  the  following.  Some  of  the  PM2+.  links  of  stage 
i  of  the  ADM  network  connect  a  stage  i  input  cell  j  to  a  stage  i  output 
cell  k  where  j  >  k.  Also  there  are  PM2_^  links  which  connect  l  to  m 
where  l  <  m.  These  type  connections  are  called  wraparound  connections. 
In  Figure  A. 2  these  links  are  the  ones  drawn  in  two  parts  indicated  by 


letters 
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Lemma  8.1:  ADM  network  wraparound  connections  are  not  used  when  the  net¬ 
work  is  controlled  using  natural  routing  tags. 

Proof:  By  definition,  a  natural  routing  tag,  T,  is  computed  as  T  =  D-S, 
where  S  and  D  represent  the  source  and  destination  addresses,  respec¬ 
tively.  If  S  £  D,  then  T  _>  0  and  the  route  which  will  be  used  is  posi¬ 
tive  dominant.  This  means  that  at  any  stage  i  in  the  network  data  rout¬ 
ed  from  S  to  D  will  pass  through  cells  with  addresses  such  that 

S  £  A.j  £  D,  for  0  £  i  <  n.  When  S  >  D,  A^  is  such  that  S  £  A^  £  D,  for 
0  <  i  <  n.  Thus  no  wraparound  connections  are  used. 

CD 

The  number  of  overall  permutations  performable  without  using  the 
wraparound  connections  can  be  counted.  Let  Py(N)  be  the  number  of  per¬ 
mutations  performable  by  an  N-input  ADM  network  without  using  wraparound 
connections. 

Theorem  8.2:  Without  using  wraparound  connections  the  number  of  distinct 
overall  permutations  the  ADM  network  can  perform  is 

PW(N)  =  PyCN/Z)2  *  LCN-1) 

where  Py(2)  =  2. 

Proof:  The  stage  0  permutations  all  irregular  exchanges,  all  +2^*,  and 
all  -2^  cannot  be  performed  without  using  wraparound  connections.  Also, 
without  wraparound  all  stage  0  permutations  can  be  represented  by  a 
characteristic  binary  number  which  need  not  include  a  bit  to  represent 
the  connections  possible  between  cells  0  and  N-1 .  Further,  any 
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characteristic  binary  number  with  no  linearly  adjacent  Is  will 
correspond  to  a  unique  stage  0  permutation.  Deleting  the  bit  for  wra¬ 
paround  connections  gives  an  N-1  bit  characteristic  binary  number  of 
which  L(N-I)  have  no  linearly  adjacent  Is.  There  are  two  independent 
source  subnetworks  for  stage  0.  Each  must  not  use  any  wraparound  con¬ 
nections  if  the  entire  network  is  not  to  use  any.  Since  each  subnetwork 
has  the  structure  of  an  N/2-input  ADM  network,  each  can  perform  P^W/Z) 
permutations  without  using  any  wraparound  connections.  Thus,  the  number 
of  input  permutations  is  Py(N/2)2.  Because  the  allowable  stage  0  permu¬ 
tations  each  have  different  pairings  of  source  subnetworks  with  network 
outputs,  PW(N)  =  Py(N/2)2  *  L (N-1 ) . 

The  value  for  Py(2)  is  found  by  direct  enumeration. 

□ 


Let  P„  .(N)  be  the  number  of  permutations  performable  by  an  N-input 
Nat 

ADM  network  using  natural  permutation  routing  tags. 


Theorem  8.3:  The  number  of  permutations  performable  by  the  N-input  ADM 
network  using  natural  permutation  routing  tags  is  bounded  by 


PNat<N/2>2 


<  P 


Nat 


(N)  <  PW(N) 


where  pNat<2)  =  2. 


Proof:  By  Lemma  8.1,  natural  routing  tags  do  not  use  wraparound  connec¬ 
tions.  Thus,  by  Theorem  8.2,  P^at  £Py(N).  The  inequality  is  strictly 
less  than  because  there  are  permutations  passable  without  using  any  wra¬ 
paround  connections  but  which  cannot  be  done  using  natural  routing  tag 
routes.  Figure  8.1  shows  an  example  of  this. 


i  overall  permutation  f 
ons  that  is  not  perfor 
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The  Lower  bound  can  be  achieved  by  setting  stage  0  to  alt  straight. 
2 

There  are  then  PNat(N/2)  input  permutations  using  only  natural  permuta¬ 
tion  routing  tags.  However,  other  settings  for  stage  0  may  be  possible 
depending  on  the  input  permutation.  If  stages  n-1  through  1  are  each 
set  to  all  straight,  then  any  stage  0  setting  not  using  wraparound  con¬ 
nections  would  give  a  network  setting  which  consisted  of  natural  routes. 
By  Lemma  6.7,  any  two  stage  0  settings  not  using  the  wraparound  connec¬ 
tions  (i.e.,  not  all  irregular  exchanges,  all  +21,  or  ;ll  -21)  will  have 
different  pairings  of  source  subnetworks  with  network  outputs.  Thus 
changing  the  stage  0  setting  will  give  a  distinct  overall  permutation. 
So  PNat;(N/2)^  is  strictly  less  than  pNat^N)* 


□ 


8.4_  Conclusions 

Routing  tags  for  the  ADM  network  were  discussed.  Full  routing  tags 
allow  any  route  to  be  specified.  Natural,  positive  dominant,  and  nega¬ 
tive  dominant  permutation  routing  tags,  while  unable  to  specify  arbi¬ 
trary  routes,  are  more  easily  computed  than  full  tags,  in  general. 

The  number  of  overall  permutations  passable  by  positive  and  nega¬ 
tive  dominant  tags  was  proven.  The  number  of  overai  l  permutations  pass¬ 
able  using  natural  tags  was  bounded. 
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CHAPTER  9 

ALGORITHMS  FOR  DETERMINING  PERMUTATION  PASSA8ILITY 
ON  THE  AUGMENTED  DATA  MANIPULATOR 

j?.1_  Introduction 

Chapter  8  discussed  some  of  the  Limitations  of  natural,  positive 
dominant,  and  negative  dominant  permutation  routing  tags.  A  count  of 
the  number  of  permutations  passable  using  either  positive  or  negative 
dominant  tags  was  given,  and  the  number  performable  using  natural  permu¬ 
tation  routing  tags  was  bounded.  However,  these  results  do  not  identify 
the  overall  permutations  passable  using  these  tags.  The  next  section 

presents  an  algorithm  which  can  be  used  to  determine  if  a  given  arbi¬ 

trary  overall  permutation  is  passable  using  natural,  positive  dominant, 
or  negative  dominant  permutation  routing  tags. 

9.£  The  Algorithms 

The  procedure  to  check  the  passability  of  a  given  overall  permuta¬ 
tion  using  any  one  of  the  routing  tag  schemes  discussed  in  Section  8.3 
is  given  in  Algorithm  9.1.  In  the  procedure,  the  variable  dest  is  an 
N-element  vector  where  dest(j)  is  the  destination  associated  with  source 
j,  for  0  <  j  <  N.  The  notation  X.  is  used  to  denote  bit  i  of  X. 

To  understand  the  operation  of  the  algorithm,  first  recall  that 
only  stage  0  can  affect  the  least  significant  bit  of  a  source  address. 
If  Sq  *  dg  then  stage  0  must  be  set  to  the  straight  link  at  the  output 

cell  with  the  destination  address.  If  Sg  4  dg  then  stage  U  must  be  set 
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procedure  PASSABILITY(dest,N); 

passable  =  true; 

whi le  passable  =  true  do 

for  i  =  0  step  1  until  log2N-2  do 

check  =  0;  /*check  is  an  N-element  vector*/ 
for  j  =  0  step  1  unti l  N-1  do 
if  dest(j)i  #  j. 
then  REQUEST; 
if  check<dest(j))  =  0 

then  check(dest(j))  =  1; 
else  passable  =  false; 
if  passable  =  false  then  stop; 


Algoritha  9.1  Procedure  PASSABILITY 
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to  a  PM2l-type  link  at  that  cell. 

Because  natural  routing  tags  specify  a  single  specific  path  from  a 
source  to  its  destination,  at  each  stage  of  the  network  there  is  only 
one  cell  through  which  a  data  item  can  pass.  The  N  routing  tags  of  the 
type  selected  must  require  that  each  data  item  pass  through  a  distinct 
cell  at  each  stage,  if  a  permutation  is  to  be  passable  using  these  tags. 
This  is  the  criteria  used  by  the  algorithm  to  decide  passability  (see 
Lemma  6.1). 

Before  proceeding  with  the  analysis  of  the  algorithm  it  is  useful 
to  define  two  terms.  Switching  cells  which  select  the  connections  to  be 
used  in  stage  i  are  called  stage  i  input  cells.  Cells  which  are  linked 
to  by  stage  i  input  cells  are  stage  i  output  cells.  Note  that  stage  i+1 
output  cells  are  stage  i  input  cells. 

The  algorithm  begins  by  comparing  bits  s^  and  d^  of  a  source,  3, 
and  its  destination,  dest(j),  where  0  <_  j  <  N.  If  Sg  =  dg  a  straight 
connection  must  be  used  to  link  a  stage  0  input  cell  to  the  stage  0  out¬ 
put  cell  whose  address  is  that  of  the  destination.  The  address  of  the 
stage  0  input  cell  is  then  dest(j).  A  straight  link  in  stage  0  is 
selected  by  retaining  the  current  dest(j)  value  as  the  updated  dest(j). 
This  is  because  the  data  item  must  pass  through  the  stage  0  input  cell 
specified  by  the  updated  value  of  dest(j). 

If  Sg  4  dg  then  a  PH2l-type  connection  must  be  used  to  link  a  stage 
0  input  cell  to  the  correct  output  cetl.  In  this  case  the  address  of 
the  stage  0  input  cell  will  be  either  (dest(j)  ♦  2^>  modulo  N  or 
(dest(j)  -  2^)  modulo  N,  where  the  -2^  and  *2^  links  are  used,  respec¬ 
tively. 
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Positive  dominant  routing  tags  can  only  use  the  +21  links  of  the 
PM2I-type  connections.  Negative  dominant  routing  tags  can  only  use  the 
-21  links.  So,  updating  dest(j)  to  the  two  possible  stage  0  input  cell 
address  values  serves  to  specify  the  +2^  or  -2^  link.  Updating  dest(j) 
proceeds  according  to  the  instruction  of  REQUEST,  three  variations  of 
which  are  listed  in  Algorithm  9.2.  A  different  version  of  REQUEST  ex¬ 
ists  for  each  type  of  tag  that  may  be  desired. 

Consider  the  version  for  positive  dominant  permutation  routing 
tags.  When  Sg  4  dg  the  value  of  dest(j)  is  replaced  by  (dest(j)  -  2^) 
modulo  N  which  specifies  the  +2^  link.  Again,  the  data  item  must  pass 
through  the  stage  0  input  cell  specified  by  the  updated  value  of 
destC j) . 

After  each  dest(j)  is  updated,  the  N-element  vector,  check,  which 
was  initialized  to  all  zeros,  is  used  to  indicate  which  stage  1  output 
cell  (same  as  stage  0  input  celt)  has  been  selected  by  the  new  dest(j). 
The  element  check(dest( j))  is  examined  to  see  if  it  is  a  zero.  If  so, 
it  is  set  to  1  to  indicate  that  a  data  item  will  pass  through  this  par¬ 
ticular  stage  1  output  cell.  If,  however  the  element  of  check  has  al¬ 
ready  been  set  to  1  this  indicates  a  conflict  of  data  items  and  the 
given  permutation  is  not  passable  using  positive  dominant  permutation 
routing  tags.  The  logic  variable,  passable,  is  set  to  false  in  this 
circumstance,  and  the  algorithm  terminates. 

If  bits  Sg  and  dg  have  been  compared  for  each  source/dest inat ion 
pair,  and  each  dest(j)  has  been  updated  without  conflict,  then  the  check 
vector  is  reinitialized  and  the  procedure  begins  anew  for  stage  1.  This 
time  bits  s^  and  d^  of  each  source/updated-destination  pair  are  compared 


For  positive  dominant  routing  tags,  REQUEST  is: 
dest(j)  =  (dest(j)~2’)  modulo  N 

For  negative  dominant  routing  tags,  REQUEST  is: 
destCj)  =  Cdest ( j )  modulo  N 

For  natural  routing  tags,  REQUEST  is: 

if  (dest(j)-j)  >  0  then  dest(j)  -  Cdes t C j ) —2 ^ >  modulo  N 
else  dest(j)  *  <dest(j)+2i)  modulo  N 

Algorithm  9.2:  Versions  of  REQUEST. 
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to  determine  the  connections  to  use  in  stage  1.  Since  the  setting  of 
stage  0  has  been  established  at  this  point,  only  stage  1  can  have  any 
affect  on  these  bits. 

The  procedure  is  repeated  for  successive  stages  until  either  a  con¬ 
flict  is  detected,  in  which  case  the  permutation  is  not  passable  and  the 
algorithm  terminates,  or  the  connections  in  stage  n-2  were  found  to  be 
conflict  free,  implying  the  permutation  is  passable. 

Stage  n-1  need  not  be  checked.  To  see  why  stage  rrl  need  not  be 
checked,  consider  the  case  where  the  permutation  under  test  is  conflict 
free  in  stages  0  through  n-2. 

Since  an  overall  permutation  is  being  tested,  if  it  is  passable  the 
stage  n-1  configuration  must  be  a  stage  n-1  permutation,  as  a  conse¬ 
quence  of  Lemma  6.1.  Recalled  from  Chapter  A  that 
PH2+(n_^  =  PN2  That  is  each  cell.  A,  can  only  be  mapped  to  it¬ 
self  or  A+2n  ^  modulo  N  =  A-2n  ^  modulo  N.  So,  a  stage  n-1  configura¬ 
tion  is  a  stage  n-1  permutation  if  and  only  if  either  the  identity  per¬ 
mutation  or  an  exchange  is  performed  on  each  pair  of  cells,  A  and  A*2n  1 
modulo  N,  for  0  <  A  <  N/2.  The  identity  permutation  can  be  performed  on 
any  of  the  pairs  of  cells  using  routing  tags  of  either  dominance.  The 
exchange  permutation  can  be  performed  on  any  of  the  pairs  of  cells  using 
positive  dominant  tags  because  A  maps  to  A +2n_1  modulo  N  via  a  ♦2n_1 
link  and  the  ■*■2°  1  link  from  cell  A+2n  1  modulo  N  will  map  that  cell  to 
(A+2n  %  ♦  2°  modulo  N  *  A+N  modulo  N  =  A,  completing  the  exchange. 
Note  that  negative  dominant  tags  can  perform  these  two  permutations  on 
any  of  the  pairs  of  cells  as  well. 


Connections  in  stage  n-1  which  are  not  stage  n-1  permutations  are 
not  needed  if  the  permutation  under  test  is  conflict  free  though  stage 
n-2.  There  are  two  non-permutation  settings  possible  on  a  pair  of  cells 
A  and  A+2n  ^  modulo  N  in  stage  n-1  which  involve  a  single  link  from  each 
cell.  One  connects  A  and  A+2  modulo  N  to  stage  n-1  output  cell  A; 
the  other  connects  the  two  cells  to  A+2N~^  modulo  N.  If  the  algorithm 
checked  stage  n-1  it  would  specify  one  of  these  two  connections  only  if 
dest(A)  =  dest(A+2n  ^  modulo  N),  which  cannot  occur  when  there  is  no 
conflict  through  stage  n-2.  Settings  involving  none,  two,  or  three 
links  from  either  A  or  A+2n  ^  modulo  N  in  stage  n-1  are  not  needed  be¬ 
cause  positive  dominant  routing  tags  (as  well  as  negative  dominant  rout¬ 
ing  tags)  define  a  single  route  from  a  source  to  destination. 

Connections  not  performable  by  stage  n-1,  for  example,  A  maps  to 
stage  n-1  output  cell  8  where  B  4  A  or  A+2n  ^  modulo  N,  are  also  not 
needed  to  complete  the  network  setting.  After  checking  stage  n-2,  the 
a'gorithm  will  have  updated  the  original  values  of  dest(A)  so  that  bits 
0  through  n-2  of  A  and  dest(A)  are  the  same.  Thus,  either  A  =  dest(A) 
or  they  differ  by  2°  \  In  either  case,  existing  links  in  stage  n-1  can 
perform  the  connection.  Thus  only  a  stage  n-1  configuration  which  is  a 
permutation  will  be  required.  Since  the  needed  stage  n-1  permutation 
can  always  be  performed,  that  stage  need  not  be  checked. 

The  algorithm  can  be  shown  to  give  a  correct  result  by  demonstrat¬ 
ing  that  it  imitates  the  subtraction  performed  to  calculate  a  routing 
tag.  Expanding  on  the  notation  where  X.  denotes  bit  i  of  X,  let  X.^, 
for  i  >  j  represent  bits  i  through  j,  inclusive,  of  X,  and  let  X...  a  0 
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for  i  <  j.  First  consider  the  case  where  positive  dominant  permutation 
routing  tags  are  desired. 


Theorem  9.1 :  Let  D  be  a  particular  destination,  S  its  source,  and  R'1' 
be  the  revised  value  of  "dest(j)"  after  the  inner  loop  of  the  algorithm 
has  been  successfully  completed  i  times.  Then  R^^..  =  (D-S^^g) 
where  0  <  i  <  n-2. 


Proof:  Let  i  =  0.  Then 


R(0)  =  D 

Vl/0  n-1/0 


*  (D  -  0) 


;  by  definition,  initially 
dest(j)  =  D  (for  S  =  j) 


n-1/0 


=  (D  “  S-1 /0)n-1/0 


;  because  S_^g  =  0 


The  remainder  of  the  proof  is  by  induction  on  i. 

Basis  step:  Let  i  ~  1. 

Case  1:  Sq  =  Dg.  In  this  case  the  algorithm  does  not  change  R* 


B(1)  -  B(0) 

n-1/1  ”  Kn-1/1 


;  because  Sq  =  Dq  the  test  in 

line  7  of  Algorithm  9.1  is  false, 
so  REQUEST  is  not  executed 


n-1/1 


*  <D  -  Wn- 


0/0'n— 1/1 


Case  2:  S^  f  Dq.  this  case  the  algorithm  for  use  with  positive  dom- 
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inant  tags  subtracts  2U  from  D. 
Subcase  (a):  Sq  =  0,  s  Dg  =  1. 


_<1>  ,  (R(0)  .  2°) 
Rn-1/1  n-1/0  n-1/1 


;  because  Rn-1/0  ‘  Dn-1/0  and 

Sg  t  Dg  the  test  in  Line  7 

of  Algorithm  9.1  is  true, 
so  REQUEST  is  executed 


<«$/i  * 2°  -  2Vi/i 


o<0)  .  , 
;  since  Rg  -  l 


'n-1/1 


n-1/1 


*  (D  ~  S0/0^n-1/1 


;  since  Sn  =  0 


Subcase  <b) :  Sq  s  Rg  =  dq  =  0* 


R(1)  *  (R(0)  -  2°) 

n-1/1  n-1/0  n-1/1 


*  (0n-1/0  '  Vn-1/1 


;  since  Sn  *  1 


*  (D  “  S0/0Jn-1 /I 
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Induction  step:  Assume  Rn-1/.  =  (D  -  s-j_i  /Q^-l /i  ’ 

Case  1:  =  R^.  In  this  case  the  algorithm  does  not  change  R^. 

Rn-1/i+1  =  R  n-1  /i+1  ;  beCause  Si  s  A''  the  test  in 

line  7  of  Algorithm  9.1  is  false, 
so  REQUEST  is  not  executed 


=  (0  -  S 


i-1 /0}n-1/i+1 


=  (D  “  Si/0)n-1/i+1 


;  since  S.  =  R^  =  (0  - 

no  borrow  can  propagate 
into  the  i+1  bit  position 


Case  2:  S..  *  Rf1^.  In  this  case  the  algorithm  for  use  with  positive 

dominant  tags  subtracts  2^  from  R^  . 

Subcase  (a):  S.  !  0,  s  1. 

Rn-1 /i+1  =  lRn-1/i  ~  2\-1/i+1  ;  because  Si  *  the  teSt 

in  line  7  of  Algorithm  9.1 
is  true,  so  REQUEST  is 
executed 


=  (R 


(i> 

n-1 /i+1 


+  2 


21) 


n-1 /i+1 


;  since  R, 


(i) 


1 


n 

"n-1 /i+1 


=  (0  -  S 


i-l/oVl/i+1 
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i 

. 

f 


CD  Si/0)n-1/i+1 


;  since  S.  =  0 

l 


Subcase  (b):  S.  =  1-  =  0. 

l  '  n 


(i+1)  _  (i)  _  i 

Vl/i+1  1  n-1/i  d  n-1/i+1 


=  ((D  -  S .  .  /n)  .  ..  -  21)  .... 

i-1/O  n-1/i  n-1/i+1 


((0  -  S.  /r>)  .  ..)  ,  „ 

l/O  n-1/i  n-1/i+1 


;  since  S.  =  1 

i 


=  (D  Si/0)n-1/i+1  * 

C3 

Let  | T |  denote  the  magnitude  of  T.  For  positive  dominant  routing 
tags:  |T|  =  1 1>— S  |  =  D-S  where  D  S,  and  when  D  <  S,  |T|  =  the  two’s 
complement  of  |D-S|  =  the  two's  complement  of  (S-D)  =  N-(S-D)  =  N  +  CD-S). 
Thus,  T.  =  CD-S._1/0).  -  S..  Hence,  T.  =  CD-S._1/Q)i  ®  S.  .  From 
Theorem  9.1,  R*0  =  Therefore,  T.  =  R f 1 ^  ®  S..  So  T. 

and  R..1^  ®  are  equivalent  criteria.  When  rP^  ®  S..  =  0,  that  is, 
when  R^1^  and  are  the  same,  the  algorithm  selects  the  straight  con¬ 
nection  in  stage  i  as  does  the  routing  tag.  When  R.(i)  and  S..  differ, 
the  algorithm  selects  the  +21  connection,  as  would  the  routing  tag, 
since  in  this  case  T.  =  1.  Thus  the  algorithm  faithfully  simulates  the 
operation  of  the  positive  dominant  permutation  routing  tags. 

If  negative  dominant  permutation  routing  tags  are  to  be  used,  a 
similar  argument  shows  that  the  algorithm,  with  the  appropriate  version 
of  REQUEST,  simulates  this  situation  accurately.  For  natural  permuta- 
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tion  routing  tags,  REQUEST  includes  a  check  for  positive  dominance  by 
determining  if  dest(j)  -  j  is  positive,  i.e.  greater  than  zero.  If  so, 
dest(j)  is  updated  using  the  positive  dominant  PM2I  connection.  If  not, 
dest(j)  is  revised  using  the  negative  dominant  PM2I  link.  All  other 
operations  are  as  in  the  previous  two  cases. 

Correct  operation  with  natural  routing  tags  is  assured  because  the 
revised  elements  of  the  dest  vector  retain  the  positive  dominant  versus 
negative  dominant  information.  This  is  so  because  if  S  is  routed  to  D 
using  the  natural  tag,  and  is  the  stage  i  output  cell  that  the  speci¬ 
fied  path  maps  to  in  stage  i,  then  S  £  A^  _<  D  when  D  >  S,  and  S  A^  D 
when  D  <  S,  for  0  <  i  <  n.  So  the  dominance  of  the  natural  routing  tag 
generated  by  S  and  0  can  be  determined  from  S  and  any  of  the  A^ . 

The  time  complexity  of  the  algorithm  is  0(N  log^N),  in  all  cases. 
The  space  complexity  is  0(N),  in  particular,  two  N-element  vectors. 

9_._3  Conclusions 

Several  routing  tag  schemes  for  the  ADM  in  an  SIMD  environment  were 
reviewed  in  this  chapter.  Full  routing  tags,  which  allow  unrestricted 
use  of  the  capabilities  of  the  ADM  network,  were  described  and  the  dif¬ 
ficulty  of  generating  such  tags  was  noted.  Natural,  positive  dominant, 
and  negative  dominant  permutation  routing  tags,  which  are  easily  comput¬ 
ed,  were  defined. 

An  algorithm,  in  three  versions,  was  given  to  determine  the  passa- 
bility  of  an  arbitrary  overall  permutation  using  any  of  these  three 
types  of  more  easily  computed  tags.  Algorithm  operation  was  described 
and  the  computed  determination  of  permutation  passability  shown  to  be 


correct . 
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CHAPTER  10 

FURTHER  PROPERTIES  OF  THE  AUGMENTED  DATA  MANIPULATOR  NETWORK 

10.1  Introduction 

This  chapter  continues  the  development  of  ADM  network  properties. 
Group  theory  CMCC],  an  area  of  abstract  algebra,  is  used  to  prove  the 
theorems  which  are  presented.  The  results  obtained  further  characterize 
the  permuting  ability  of  the  ADM  network. 

10.2  Definitions  and  Notation 

The  results  to  be  presented  make  use  of  the  following  terminology 
and  notation  from  group  theory.  For  some  permutation  f,  f  ^  is  the 
inverse  of  f,  that  is  if  f  maps  (connects)  a  source  S  to  a  destination 
D,  then  f  ^  maps  D  to  S.  This  holds  for  all  source/destination  pairs  of 
f. 

A  permutation,  f,  on  the  set  of  PE  addresses,  <0,1, . . ,,N-1 >,  can  be 
represented  in  Cauchy  notation  as 

/  0  1  ...  N-1 

\f(0>  f(1)  ...  f (N-1 ) 

where  the  top  line  is  the  source  and  the  bottom  line  is  the  destination 
to  which  f  maps  the  source. 

The  permutation  f  can  be  represented  as  a  product  of  cycles,  where 


I 

) 


the  cycle 


The 
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<j0  j1  >2  Vi  V 

means  f(jQ)  =  jy  f(j1)  =  j2,  ...,  ftj^)  =  i*,  and  f<jx>  =  jQ. 
physical  interpretation  of  this  cycle  is  that  network  input  jg  is  con¬ 
nected  to  network  output  jy  input  j^  is  connected  to  output  j^,  ..., 
input  jx_^  is  connected  to  output  jx,  and  input  jx  is  connected  to  out¬ 
put  jg.  The  length  of  this  cycle  is  x+1;  it  is  called  an  x+1  cycle.  A 
transposition  is  a  cycle  of  length  two.  For  example,  if  f  is  written  in 
Cauchy  notation  as 

/0  1  2  3  4  5  6  7\ 

\1  2  3  0  4  5  7  6/ 

then  it  can  be  written  as  the  product  of  the  four  cycles: 

10  1  2  3) (4) (5) (6  7)  . 

A  permutation  is  even  or  odd  depending  on  whether  it  can  be  ex¬ 
pressed  as  a  product  of  an  odd  or  even  number  of  transpositions,  respec¬ 
tively.  For  example,  the  permutation  represented  as  (0  1  2)  can  be 
written  as  (1  2H0  2),  where  the  product  is  formed  from  right  to  left, 
as  is  the  customary  notation.  Alternately,  a  k-cycle  can  be  shown  to  be 
even  or  odd  as  k  is  odd  or  even. 

An  element  of  a  cycle  is  any  network  address  contained  in  the  cy¬ 
cle.  Two  cycles  are  disjoint  if  they  have  no  elements  in  common.  The 
cycles  (0  1  6)  and  (7  3)  are  disjoint,  for  example.  Any  permutation  can 
be  written  as  a  unique  product  of  disjoint  cycles.  This  is  the  disjoint 
cycle  decomposition  of  the  permutation. 
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10.3  Theoretical  Results 

This  section  considers  further  properties  of  the  ADM  network. 

Lemma  10.1 :  Any  1-cycle  in  an  overall  permutation  must  be  routed  using 
the  straight  connection  on  every  stage. 

Proof:  Proof  by  contradiction.  Assume  a  PM2I  link  is  used  in  stage  0. 
This  must  give  a  mapping  to  a  destination  which  differs  from  the  source 
in  bit  0  since  only  stage  0  can  affect  this  bit.  So  the  straight  con¬ 
nection  must  be  used  in  stage  0.  Using  a  PM2I  link  in  stage  1  must  give 
a  mapping  to  a  destination  differing  from  the  source  in  bit  1 ,  because 
only  stage  1  can  affect  bit  1  once  stage  0  is  fixed.  Continuing  this 
chain  of  reasoning  shows  each  stage  must  use  the  straight  link. 

C] 

In  CSI6]  it  is  shown  that  in  the  ith  stage  of  the  ADM  network, 
0  <  i  <  n,  the  transfer  of  data  from  the  stage  i  input  cell  j  can  be 
represented  by  only  one  of  five  possible  cycles.  The  cycles  are  (j), 
(j  j+21  j +2*2 ’  j+3*21  ...  j+N-21),  (j+N-21  ...  j+3+21  j+2+21  j+21  j), 

(j  j+21),  or  (j  j-21)  where  all  arithmetic  is  modulo  N,  0  £  j  <  N.  The 
first  of  these  is  a  1-cycle.  The  network  can  perform  any  1-cycle  at  any 
stage.  The  next  two  can  be  found  to  be  2n  1  cycles.  These  specific 
2n_1  cycles  are  called  network  implemented  2n  ^cycles  since  they  are 
directly  implemented  by  stage  i.  The  last  two  cycles  listed  are  tran¬ 
spositions.  These  are  called  network  implemented  transpositions.  Col¬ 
lectively,  the  cycles  which  can  be  directly  implemented  by  a  stage  are 
referred  to  as  network  implemented. 


The  relationship  between  ADM  network  structure  and  the  cycle  decom¬ 
position  of  a  permutation  is  considered  in  the  following  theorem  and 
corollary. 

Theorem  10.1;  In  the  ADM  network  all  cycles  of  length  greater  than  one 
which  are  part  of  the  unique  disjoint  cycle  decomposition  of  a  passable 
overall  permutation,  must  be  expressible  in  terms  of  a  product  of  net¬ 
work  implemented  transpositions  and/or  2n_1-cycles  with  elements  limited 
to  those  of  the  cycles  of  length  greater  than  one. 

Proof:  From  Lemma  10.1,  the  network  must  be  set  to  straight  at  each 
stage  i  cell  with  the  same  address  as  any  1-cycles  in  the  overall  permu¬ 
tation.  Thus,  the  element  of  any  1-cycle  cannot  appear  as  an  element  in 
any  n-cycle,  for  n  >  1,  of  the  passable  overall  permutation. 

U 

Corollary  10.1:  An  overall  permutation  consisting  of  a  single  transposi¬ 
tion  is  passable  by  the  ADM  network  if  and  only  if  it  is  a  network  im¬ 
plemented  transposition,  i.e.,  of  the  form  (j  jiZ1  modulo  N),  where 
0  £  j  <  N  and  0  <  i  <  n. 

Proof:  From  Theorem  10.1,  only  the  network  implemented  cycles  whose  ele¬ 
ments  are  the  same  as  those  of  the  given  single  transposition  can  be 
used  to  pass  the  permutation.  That  is,  if  more  than  two  cells  use 
PM2l-type  links  then  N-2  routes  of  only  straight  links  cannot  exist. 
Thus,  if  an  overall  permutation  consisting  of  a  single  transposition  is 
passable  then  that  transposition  is  network  implemented. 


I 


i 

f 
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If  an  overall  permutation  consists  of  a  single  transposition  which 
is  network  implemented  then,  clearly,  it  is  passable. 

CD 

SN  is  the  permutation  group  for  N  elements.  That  is  SN  is  the  set 

of  all  permutations  of  N  elements.  A  is  the  alternating  group  on  N  ob- 

N 

jects  and  consists  of  all  even  permutations  of  the  N  items. 

Theorem  10.2:  The  ADM  network  cannot  perform  all  3-cycles  in  one  pass 
for  N  ^  8. 

Proof:  Consider  the  3-cycle  (01  6).  To  be  passable  it  must  decompose 
into  network  implemented  cycles  with  elements  in  the  set  C0,1,6>  (see 
Theorem  10.1).  The  network  implemented  cycles  must  be  of  length  three 
or  shorter  since  there  are  only  three  allowable  elements.  So  only  net¬ 
work  implemented  transpositions  may  be  used. 

The  possible  transpositions  with  elements  in  the  set  C0,1,6>  are 
(0  1),  (0  6),  and  (1  6).  Network  implemented  transpositions  are  of  the 

form  (j  j+21  modulo  N),  so  (1  6)  is  not  implemented  for  any  N.  The  cy¬ 
cle  (0  6)  is  implemented  only  when  N  =  8.  When  N  =  8,  0-2^  modulo  8  = 
6,  so  (0  6)  is  implemented.  For  N  <  8,  (0  6)  has  an  element,  6,  outside 
the  range  of  source  addresses,  0  to  N-1 .  For  N  >  8  (i.e.,  n  >  3),  0+21 
modulo  N  ¥  6,  for  0  <  i  <  n,  and 
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0-2 1  modulo  N  =  2n  -  21 


n-1  ,  i-1  , 

=  E  2k  -  E  2 


k=0 

n_1  k 

=  E  2k 

k=i 
*  6 


k=Q 


Thus  for  N  >  8,  (0  1  6)  clearly  cannot  be  performed  since  only  (0  1)  can 
be  performed  of  the  three  possible  transpositions  on  the  set  {0,1,63-. 
For  N  =  8,  the  transpositions  (0  1)  and  (0  6)  can  be  performed.  Howev¬ 
er,  (0  6)  is  performed  in  stage  1  and  (01)  in  stage  0  only.  So  the 
network  can  only  implement  the  product  (0  1 3  CO  6)  and  not  (0  6H0  1) 
since  stage  order  is  fixed.  But  (0  1H0  6)  =  (0  6  1)*  (0  1  6).  The 
ADM  cannot,  then,  pass  all  3-cycles. 

CD 


Theorem  10.3:  For  N  ^  8  the  ADM  network  does  not  pass  AN« 

Proof:  From  Theorem  10.2  the  ADM  does  not  pass  all  3-cycles  for  N  ^  8. 

A  contains  all  even  permutations  of  N  elements  and  a  3-cycle  is  an  even 

N 

permutation.  Thus  the  ADM  does  not  pass  every  element  of  A^. 

CD 


10.4  Conclusions 

This  chapter  presented  further  properties  of  the  ADM  network  in  an 
SIMD  environment.  The  results  which  were  stated  aid  in  characterizing 
the  family  of  permutations  passable  by  the  ADM. 
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CHAPTER  11 
CONCLUSIONS 


This  work  is  a  study  of  various  aspects  of  the  ADM  network  which 
influence  its  suitability  for  use  in  SIMD  parallel  processing  systems. 
The  first  aspect  considered  was  the  number  of  permutations  passable  by 
the  ADM  network.  Next  a  routing  tag  ..'’erne  that  has  been  developed  for 
distributed  control  of  the  ADM  network  was  described  so  that  its  perfor¬ 
mance  in  an  SIMD  environment  can  be  investigated.  Then  algorithms  for 
determining  permutation  passability  using  these  routing  tags  were 
presented  and  analyzed.  Finally,  some  additional  theoretical  results 
were  developed. 

SIMD  machine  models  were  described  in  Chapter  2  to  provide  a  back¬ 
ground  in  which  to  evaluate  the  ADM  network.  The  role  of  the  intercon¬ 
nection  network  in  SIMD  machines  was  outlined  for  two  system  architec¬ 
tures.  Some  basic  requirements  and  limitations  for  each  structure  were 
noted. 

Chapter  3  introduced  PASM,  a  partitionable  SIMD/MIMD  multimicropro¬ 
cessor  system.  Unique  features  of  PASM  include  being  1)  dynamically 
reconfigurable;  2)  able  to  operate  in  either  SIMD  or  MIMD  mode  of 
parallelism;  and  3)  able  to  be  partitioned  into  machines  of  different 
sizes,  each  of  which  may  operate  in  SIMD  or  MIMD  mode.  Two  interconnec¬ 
tion  networks  are  being  considered  for  use  in  PASM:  the  Generalized 


Cube  and  the  ADM.  The  Generalized  Cube  network  is  better  understood 
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than  the  ADM  network.  Increased  knowledge  of  ADM  properties  will  allow 
a  more  informed  decision  on  the  interconnection  network  for  PASM,  and 
other  parallel  processing  systems,  to  be  made. 

The  Generalized  Cube  and  ADM  networks  were  formally  defined  in 
Chapter  4.  The  Generalized  Cube  was  noted  to  be  representative  of  a 
class  of  cube-type  networks,  including  the  STARAN  flip,  the  omega,  the 
indirect  binary  n-cube,  and  the  SH-banyan  (S=F=2)  networks.  The  ADM  was 
shown  to  be  derived  from  the  data  manipulator. 

The  numb er  of  distinct  permutations  passable  by  the  Generalized 
Cube  was  given  in  Chapter  5.  The  procedure  used  to  obtain  this  result 
relies  on  the  one-to-one  correspondence  between  permutations  and  legiti¬ 
mate  network  settings.  Both  the  count  of  permutations  and  this  one-to- 
one  correspondence  were  used  later  for  comparative  purposes. 

Chapter  6  considered  the  number  of  permutations  performable  by  the 
ADM  network.  A  method  was  given  for  counting  the  number  of  settings 
which  are  permutations  for  any  stage.  Using  partitioning  theory  and 
combinatorial  mathematics,  upper  and  lower  bounds  were  established  on 
the  number  of  overall  permutations  which  an  ADM  network  can  perform.  To 
assess  the  characteristics  of  the  bounds  their  tightness  and  asymptotic 
behavior  was  studied.  For  an  ADM  network  with  eight  inputs,  an  exact 
count  of  the  number  of  passable  overall  permutations  was  proven.  Last¬ 
ly,  the  number  of  ADM  passable  permutations  was  compared  with  that  of 
the  Generalized  Cube. 

The  use  of  routing  tags  for  distributed  control  of  interconnection 
networks  was  introduced  in  Chapter  7.  The  permuting  ability  of  the  Gen¬ 
eralized  Cube  when  used  with  routing  tags  was  discussed.  The  results 
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obtained  were  used  for  comparison  with  the  ADM  network. 

I 

In  Chapter  8  several  routing  tag  schemes  which  allow  distributed 
control  of  the  ADM  network  were  reviewed.  The  number  of  permutations 
performable  using  either  positive  dominant  or  negative  dominant  permuta¬ 
tion  routing  tags  was  counted. 

Chapter  9  presented  an  algorithm  which  can  determine  if  an  arbi¬ 
trary  overall  permutation  is  passable  using  either  natural,  positive 
dominant,  or  negative  dominant  permutation  routing  tags.  Correct  opera¬ 
tion  of  the  algorithm  was  demonstrated  and  its  complexity  stated. 

Further  properties  of  the  ADM  network  can  be  derived  using  group 
theory.  The  results  which  were  presented  in  Chapter  10  aid  in  charac¬ 
terizing  the  family  of  permutations  passable  by  the  ADM  network. 

Choosing  the  interconnection  network  for  a  parallel  processing  sys¬ 
tem  such  as  PASM  is  an  important  and  difficult  design  task  for  the  sys¬ 
tem  architect.  A  satisfactory  compromise  among  many  interconnection 
network  parameters  including,  among  others,  permuting  capability  and 
performance  with  distributed  control,  must  be  reached.  Analyses  such  as 
those  presented  here  are  necessary  in  order  to  evaluate  the  cost- 
effectiveness  of  the  ADM  as  an  SIMD  interconnection  network. 
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